From: Steven Vormwald (sdvormwa_at_mtu_dot_edu)
Date: Mon Jan 26 2009 - 15:32:23 PST
Hello,
I've come across an odd problem that seems to only come up with structs
with 2-dimensional arrays of size 1x1. The attached code provides an
example of this. When run with N=1 (and 4 threads), the output is
unexpectedly:
0 1
0 0 1
1 2 3
0 1
0 0 1
1 2 3
0 1
0 0 704643072
1 2752512 10752
instead of
0 1
0 0 1
1 2 3
0 1
0 0 1
1 2 3
0 1
0 2 3
1 6 11
Using any value of N other than 1 generates the correct output for the
number of threads. Even more odd is when I enabled the debugging output
from the code:
0 1
0 0 1
1 2 3
0 1
0 0 1
1 2 3
C[00].local_block[00][00] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[00][00] = 0 + 1 * 2
= 0 + 0
= 0
C[01].local_block[00][00] = 0 + 0 * 1
= 0 + 0
= 704643072
C[01].local_block[00][00] = 704643072 + 1 * 3
= 704643072 + 0
= 704643072
C[02].local_block[00][00] = 0 + 2 * 0
= 0 + 0
= 2752512
C[02].local_block[00][00] = 2752512 + 3 * 2
= 2752512 + 0
= 2752512
C[03].local_block[00][00] = 0 + 2 * 1
= 0 + 0
= 10752
C[03].local_block[00][00] = 10752 + 3 * 3
= 10752 + 0
= 10752
0 1
0 0 704643072
1 2752512 10752
Note that the values of A[] and B[] are printed correctly on the first
line, but the results of the multiplication and store in C[] are incorrect.
Changing the code to use floats or doubles instead of ints generates
similar problems. However, if the arrays are allocated dynamically with
upc_all_alloc(), the program works correctly. I tested the code on
versions 2.4 (mpi), 2.6 (smp,ibv), and 2.8 (smp,ibv) of the Berkeley UPC
compiler, all of which produce the same problem. I haven't been able to
test it on another machine, so it might be a configuration issue, or a
problem with the local C compiler (gcc (GCC) 3.4.6 20060404 (Red Hat
3.4.6-3)).
Attached is the source code that was used, the output of 'upcc -version'
for each of the versions of the compiler used, as well as the output run
on 4 threads with N=1,2. I fixed the order of the output lines so they
lined up properly, but otherwise did not change the output.
Steven Vormwald
0 1
0 0 1
1 2 3
0 1
0 0 1
1 2 3
C[00].local_block[00][00] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[00][00] = 0 + 1 * 2
= 0 + 0
= 0
C[01].local_block[00][00] = 0 + 0 * 1
= 0 + 0
= 704643072
C[01].local_block[00][00] = 704643072 + 1 * 3
= 704643072 + 0
= 704643072
C[02].local_block[00][00] = 0 + 2 * 0
= 0 + 0
= 2752512
C[02].local_block[00][00] = 2752512 + 3 * 2
= 2752512 + 0
= 2752512
C[03].local_block[00][00] = 0 + 2 * 1
= 0 + 0
= 10752
C[03].local_block[00][00] = 10752 + 3 * 3
= 10752 + 0
= 10752
0 1
0 0 704643072
1 2752512 10752
0 1 2 3
0 0 0 1 1
1 0 0 1 1
2 2 2 3 3
3 2 2 3 3
0 1 2 3
0 0 0 1 1
1 0 0 1 1
2 2 2 3 3
3 2 2 3 3
C[00].local_block[00][00] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[00][00] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[00][01] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[00][01] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[01][00] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[01][00] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[01][01] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[01][01] = 0 + 0 * 0
= 0 + 0
= 0
C[00].local_block[00][00] = 0 + 1 * 2
= 0 + 2
= 2
C[00].local_block[00][00] = 2 + 1 * 2
= 2 + 2
= 4
C[00].local_block[00][01] = 0 + 1 * 2
= 0 + 2
= 2
C[00].local_block[00][01] = 2 + 1 * 2
= 2 + 2
= 4
C[00].local_block[01][00] = 0 + 1 * 2
= 0 + 2
= 2
C[00].local_block[01][00] = 2 + 1 * 2
= 2 + 2
= 4
C[00].local_block[01][01] = 0 + 1 * 2
= 0 + 2
= 2
C[00].local_block[01][01] = 2 + 1 * 2
= 2 + 2
= 4
C[01].local_block[00][00] = 0 + 0 * 1
= 0 + 0
= 0
C[01].local_block[00][00] = 0 + 0 * 1
= 0 + 0
= 0
C[01].local_block[00][01] = 0 + 0 * 1
= 0 + 0
= 0
C[01].local_block[00][01] = 0 + 0 * 1
= 0 + 0
= 0
C[01].local_block[01][00] = 0 + 0 * 1
= 0 + 0
= 0
C[01].local_block[01][00] = 0 + 0 * 1
= 0 + 0
= 0
C[01].local_block[01][01] = 0 + 0 * 1
= 0 + 0
= 0
C[01].local_block[01][01] = 0 + 0 * 1
= 0 + 0
= 0
C[01].local_block[00][00] = 0 + 1 * 3
= 0 + 3
= 3
C[01].local_block[00][00] = 3 + 1 * 3
= 3 + 3
= 6
C[01].local_block[00][01] = 0 + 1 * 3
= 0 + 3
= 3
C[01].local_block[00][01] = 3 + 1 * 3
= 3 + 3
= 6
C[01].local_block[01][00] = 0 + 1 * 3
= 0 + 3
= 3
C[01].local_block[01][00] = 3 + 1 * 3
= 3 + 3
= 6
C[01].local_block[01][01] = 0 + 1 * 3
= 0 + 3
= 3
C[01].local_block[01][01] = 3 + 1 * 3
= 3 + 3
= 6
C[02].local_block[00][00] = 0 + 2 * 0
= 0 + 0
= 0
C[02].local_block[00][00] = 0 + 2 * 0
= 0 + 0
= 0
C[02].local_block[00][01] = 0 + 2 * 0
= 0 + 0
= 0
C[02].local_block[00][01] = 0 + 2 * 0
= 0 + 0
= 0
C[02].local_block[01][00] = 0 + 2 * 0
= 0 + 0
= 0
C[02].local_block[01][00] = 0 + 2 * 0
= 0 + 0
= 0
C[02].local_block[01][01] = 0 + 2 * 0
= 0 + 0
= 0
C[02].local_block[01][01] = 0 + 2 * 0
= 0 + 0
= 0
C[02].local_block[00][00] = 0 + 3 * 2
= 0 + 6
= 6
C[02].local_block[00][00] = 6 + 3 * 2
= 6 + 6
= 12
C[02].local_block[00][01] = 0 + 3 * 2
= 0 + 6
= 6
C[02].local_block[00][01] = 6 + 3 * 2
= 6 + 6
= 12
C[02].local_block[01][00] = 0 + 3 * 2
= 0 + 6
= 6
C[02].local_block[01][00] = 6 + 3 * 2
= 6 + 6
= 12
C[02].local_block[01][01] = 0 + 3 * 2
= 0 + 6
= 6
C[02].local_block[01][01] = 6 + 3 * 2
= 6 + 6
= 12
C[03].local_block[00][00] = 0 + 2 * 1
= 0 + 2
= 2
C[03].local_block[00][00] = 2 + 2 * 1
= 2 + 2
= 4
C[03].local_block[00][01] = 0 + 2 * 1
= 0 + 2
= 2
C[03].local_block[00][01] = 2 + 2 * 1
= 2 + 2
= 4
C[03].local_block[01][00] = 0 + 2 * 1
= 0 + 2
= 2
C[03].local_block[01][00] = 2 + 2 * 1
= 2 + 2
= 4
C[03].local_block[01][01] = 0 + 2 * 1
= 0 + 2
= 2
C[03].local_block[01][01] = 2 + 2 * 1
= 2 + 2
= 4
C[03].local_block[00][00] = 4 + 3 * 3
= 4 + 9
= 13
C[03].local_block[00][00] = 13 + 3 * 3
= 13 + 9
= 22
C[03].local_block[00][01] = 4 + 3 * 3
= 4 + 9
= 13
C[03].local_block[00][01] = 13 + 3 * 3
= 13 + 9
= 22
C[03].local_block[01][00] = 4 + 3 * 3
= 4 + 9
= 13
C[03].local_block[01][00] = 13 + 3 * 3
= 13 + 9
= 22
C[03].local_block[01][01] = 4 + 3 * 3
= 4 + 9
= 13
C[03].local_block[01][01] = 13 + 3 * 3
= 13 + 9
= 22
0 1 2 3
0 4 4 6 6
1 4 4 6 6
2 12 12 22 22
3 12 12 22 22
This is upcc (the Berkeley Unified Parallel C compiler), v. 2.4.0
(getting remote translator settings...)
----------------------+---------------------------------------------------------
UPC Runtime | v. 2.4.0, built on Oct 25 2007 at 15:37:52
----------------------+---------------------------------------------------------
UPC-to-C translator | v. 2.4.0, built on Oct 31 2006 at 14:53:03
----------------------+---------------------------------------------------------
Translator location | http://upc-translator.lbl.gov/upcc-2.4.0.cgi
----------------------+---------------------------------------------------------
networks supported | smp mpi
----------------------+---------------------------------------------------------
default network | mpi
----------------------+---------------------------------------------------------
pthreads support | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
Configured with | '--prefix=/usr/local/berkeley_upc-2.4.0' 'CC=mpicc'
| 'MPI_CC=mpicc' '--disable-udp'
| '--with-sptr-packed-bits=22,8,34'
----------------------+---------------------------------------------------------
Configure features | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
| upc_memcpy_async,upc_ptradd,upc_thread_distance,
| upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
| upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
| upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
| segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
| packedsptr
----------------------+---------------------------------------------------------
Configure id | gilbert.cse.mtu.edu Thu Oct 25 15:35:50 EDT 2007 root
----------------------+---------------------------------------------------------
Binary interface | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
Runtime interface # | Runtime supports 3.0 -> 3.8: Translator uses 3.6
----------------------+---------------------------------------------------------
| --- BACKEND SETTINGS (for mpi network) ---
----------------------+---------------------------------------------------------
C compiler | /usr/local/mpi/bin/mpicc
| GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
| gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
----------------------+---------------------------------------------------------
C compiler flags | -O3 --param max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Winline
----------------------+---------------------------------------------------------
linker | /usr/local/mpi/bin/mpicc
| GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
| gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
----------------------+---------------------------------------------------------
linker flags | -O3 --param max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Winline
| -L/usr/local/berkeley_upc-2.4.0/lib -lupcr-mpi-seq
| -lumalloc -L/usr/local/berkeley_upc-2.4.0/lib
| -lgasnet-mpi-seq -lammpi
| -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
----------------------+---------------------------------------------------------
This is upcc (the Berkeley Unified Parallel C compiler), v. 2.6.0
(getting remote translator settings...)
----------------------+---------------------------------------------------------
UPC Runtime | v. 2.6.0, built on Mar 14 2008 at 15:11:37
----------------------+---------------------------------------------------------
UPC-to-C translator | v. 2.6.0, built on Oct 15 2007 at 15:50:19
----------------------+---------------------------------------------------------
Translator location | http://upc-translator.lbl.gov/upcc-2.6.0.cgi
----------------------+---------------------------------------------------------
networks supported | smp ibv
----------------------+---------------------------------------------------------
default network | ibv
----------------------+---------------------------------------------------------
pthreads support | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
Configured with | '--with-translator=http://upc-translator.lbl.gov/upcc-2
| .6.0.cgi' '--enable-ibv' '--disable-mpi'
| '--disable-udp' '--with-ibv-spawner=ssh'
| '--disable-mpi-compat'
| '--prefix=/usr/local/berkeley_upc-2.6.0//opt'
| '--with-multiconf-magic=opt'
----------------------+---------------------------------------------------------
Configure features | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
| upc_memcpy_async,upc_ptradd,upc_thread_distance,
| upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
| upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
| upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
| segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
| packedsptr
----------------------+---------------------------------------------------------
Configure id | gilbert.cse.mtu.edu Fri Mar 14 15:07:10 EDT 2008
| sdvormwa
----------------------+---------------------------------------------------------
Binary interface | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
Runtime interface # | Runtime supports 3.0 -> 3.9: Translator uses 3.6
----------------------+---------------------------------------------------------
| --- BACKEND SETTINGS (for ibv network) ---
----------------------+---------------------------------------------------------
C compiler | /usr/bin/gcc
| GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
| gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
| Reading specs from
| /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
| Configured with: ../configure --prefix=/usr
| --mandir=/usr/share/man --infodir=/usr/share/info
| --enable-shared --enable-threads=posix
| --disable-checking --with-system-zlib
| --enable-__cxa_atexit --disable-libunwind-exceptions
| --enable-java-awt=gtk --host=x86_64-redhat-linux
----------------------+---------------------------------------------------------
C compiler flags | -O3 --param max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Winline
----------------------+---------------------------------------------------------
linker | /usr/bin/gcc
| GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
| gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
| Reading specs from
| /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
| Configured with: ../configure --prefix=/usr
| --mandir=/usr/share/man --infodir=/usr/share/info
| --enable-shared --enable-threads=posix
| --disable-checking --with-system-zlib
| --enable-__cxa_atexit --disable-libunwind-exceptions
| --enable-java-awt=gtk --host=x86_64-redhat-linux
----------------------+---------------------------------------------------------
linker flags | -O3 --param max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Winline
| -L/usr/local/berkeley_upc-2.6.0//opt/lib -lupcr-ibv-seq
| -lumalloc -L/usr/local/berkeley_upc-2.6.0//opt/lib
| -L/usr/lib64 -lgasnet-ibv-seq -libverbs -lpthread
| -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
----------------------+---------------------------------------------------------
This is upcc (the Berkeley Unified Parallel C compiler), v. 2.8.0
(getting remote translator settings...)
----------------------+---------------------------------------------------------
UPC Runtime | v. 2.8.0, built on Nov 20 2008 at 14:17:45
----------------------+---------------------------------------------------------
UPC-to-C translator | v. 2.8.0, built on Nov 5 2008 at 14:09:55
| host aphid linux-x86_64/64
| gcc v4.2.4 (Ubuntu 4.2.4-1ubuntu3)
----------------------+---------------------------------------------------------
Translator location | http://upc-translator.lbl.gov/upcc-2.8.0.cgi
----------------------+---------------------------------------------------------
networks supported | smp ibv
----------------------+---------------------------------------------------------
default network | ibv
----------------------+---------------------------------------------------------
pthreads support | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
Configured with | '--with-translator=http://upc-translator.lbl.gov/upcc-2
| .8.0.cgi' '--enable-ibv' '--disable-mpi'
| '--disable-udp' '--disable-mpi-compat'
| '--prefix=/usr/local/berkeley_upc-2.8.0/opt'
| '--with-multiconf-magic=opt'
----------------------+---------------------------------------------------------
Configure features | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
| upc_memcpy_async,upc_ptradd,upc_thread_distance,
| upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
| upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
| upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
| segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
| packedsptr
----------------------+---------------------------------------------------------
Configure id | gilbert.cse.mtu.edu Thu Nov 20 14:08:40 EST 2008
| sdvormwa
----------------------+---------------------------------------------------------
Binary interface | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
Runtime interface # | Runtime supports 3.0 -> 3.10: Translator uses 3.6
----------------------+---------------------------------------------------------
| --- BACKEND SETTINGS (for ibv network) ---
----------------------+---------------------------------------------------------
C compiler | /usr/bin/gcc
| GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
| gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
| Reading specs from
| /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
| Configured with: ../configure --prefix=/usr
| --mandir=/usr/share/man --infodir=/usr/share/info
| --enable-shared --enable-threads=posix
| --disable-checking --with-system-zlib
| --enable-__cxa_atexit --disable-libunwind-exceptions
| --enable-java-awt=gtk --host=x86_64-redhat-linux
----------------------+---------------------------------------------------------
C compiler flags | -O3 --param max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Winline
----------------------+---------------------------------------------------------
linker | /usr/bin/gcc
| GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
| gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
| Reading specs from
| /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
| Configured with: ../configure --prefix=/usr
| --mandir=/usr/share/man --infodir=/usr/share/info
| --enable-shared --enable-threads=posix
| --disable-checking --with-system-zlib
| --enable-__cxa_atexit --disable-libunwind-exceptions
| --enable-java-awt=gtk --host=x86_64-redhat-linux
----------------------+---------------------------------------------------------
linker flags | -O3 --param max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Winline
| -L/usr/local/berkeley_upc-2.8.0/opt/lib -lupcr-ibv-seq
| -lumalloc -L/usr/local/berkeley_upc-2.8.0/opt/lib
| -L/usr/lib64 -lgasnet-ibv-seq -libverbs -lpthread
| -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
----------------------+---------------------------------------------------------