From: Steven Vormwald (sdvormwa_at_mtu_dot_edu)
Date: Mon Jan 26 2009 - 15:32:23 PST
Hello, I've come across an odd problem that seems to only come up with structs with 2-dimensional arrays of size 1x1. The attached code provides an example of this. When run with N=1 (and 4 threads), the output is unexpectedly: 0 1 0 0 1 1 2 3 0 1 0 0 1 1 2 3 0 1 0 0 704643072 1 2752512 10752 instead of 0 1 0 0 1 1 2 3 0 1 0 0 1 1 2 3 0 1 0 2 3 1 6 11 Using any value of N other than 1 generates the correct output for the number of threads. Even more odd is when I enabled the debugging output from the code: 0 1 0 0 1 1 2 3 0 1 0 0 1 1 2 3 C[00].local_block[00][00] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[00][00] = 0 + 1 * 2 = 0 + 0 = 0 C[01].local_block[00][00] = 0 + 0 * 1 = 0 + 0 = 704643072 C[01].local_block[00][00] = 704643072 + 1 * 3 = 704643072 + 0 = 704643072 C[02].local_block[00][00] = 0 + 2 * 0 = 0 + 0 = 2752512 C[02].local_block[00][00] = 2752512 + 3 * 2 = 2752512 + 0 = 2752512 C[03].local_block[00][00] = 0 + 2 * 1 = 0 + 0 = 10752 C[03].local_block[00][00] = 10752 + 3 * 3 = 10752 + 0 = 10752 0 1 0 0 704643072 1 2752512 10752 Note that the values of A[] and B[] are printed correctly on the first line, but the results of the multiplication and store in C[] are incorrect. Changing the code to use floats or doubles instead of ints generates similar problems. However, if the arrays are allocated dynamically with upc_all_alloc(), the program works correctly. I tested the code on versions 2.4 (mpi), 2.6 (smp,ibv), and 2.8 (smp,ibv) of the Berkeley UPC compiler, all of which produce the same problem. I haven't been able to test it on another machine, so it might be a configuration issue, or a problem with the local C compiler (gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3)). Attached is the source code that was used, the output of 'upcc -version' for each of the versions of the compiler used, as well as the output run on 4 threads with N=1,2. I fixed the order of the output lines so they lined up properly, but otherwise did not change the output. Steven Vormwald 0 1 0 0 1 1 2 3 0 1 0 0 1 1 2 3 C[00].local_block[00][00] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[00][00] = 0 + 1 * 2 = 0 + 0 = 0 C[01].local_block[00][00] = 0 + 0 * 1 = 0 + 0 = 704643072 C[01].local_block[00][00] = 704643072 + 1 * 3 = 704643072 + 0 = 704643072 C[02].local_block[00][00] = 0 + 2 * 0 = 0 + 0 = 2752512 C[02].local_block[00][00] = 2752512 + 3 * 2 = 2752512 + 0 = 2752512 C[03].local_block[00][00] = 0 + 2 * 1 = 0 + 0 = 10752 C[03].local_block[00][00] = 10752 + 3 * 3 = 10752 + 0 = 10752 0 1 0 0 704643072 1 2752512 10752 0 1 2 3 0 0 0 1 1 1 0 0 1 1 2 2 2 3 3 3 2 2 3 3 0 1 2 3 0 0 0 1 1 1 0 0 1 1 2 2 2 3 3 3 2 2 3 3 C[00].local_block[00][00] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[00][00] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[00][01] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[00][01] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[01][00] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[01][00] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[01][01] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[01][01] = 0 + 0 * 0 = 0 + 0 = 0 C[00].local_block[00][00] = 0 + 1 * 2 = 0 + 2 = 2 C[00].local_block[00][00] = 2 + 1 * 2 = 2 + 2 = 4 C[00].local_block[00][01] = 0 + 1 * 2 = 0 + 2 = 2 C[00].local_block[00][01] = 2 + 1 * 2 = 2 + 2 = 4 C[00].local_block[01][00] = 0 + 1 * 2 = 0 + 2 = 2 C[00].local_block[01][00] = 2 + 1 * 2 = 2 + 2 = 4 C[00].local_block[01][01] = 0 + 1 * 2 = 0 + 2 = 2 C[00].local_block[01][01] = 2 + 1 * 2 = 2 + 2 = 4 C[01].local_block[00][00] = 0 + 0 * 1 = 0 + 0 = 0 C[01].local_block[00][00] = 0 + 0 * 1 = 0 + 0 = 0 C[01].local_block[00][01] = 0 + 0 * 1 = 0 + 0 = 0 C[01].local_block[00][01] = 0 + 0 * 1 = 0 + 0 = 0 C[01].local_block[01][00] = 0 + 0 * 1 = 0 + 0 = 0 C[01].local_block[01][00] = 0 + 0 * 1 = 0 + 0 = 0 C[01].local_block[01][01] = 0 + 0 * 1 = 0 + 0 = 0 C[01].local_block[01][01] = 0 + 0 * 1 = 0 + 0 = 0 C[01].local_block[00][00] = 0 + 1 * 3 = 0 + 3 = 3 C[01].local_block[00][00] = 3 + 1 * 3 = 3 + 3 = 6 C[01].local_block[00][01] = 0 + 1 * 3 = 0 + 3 = 3 C[01].local_block[00][01] = 3 + 1 * 3 = 3 + 3 = 6 C[01].local_block[01][00] = 0 + 1 * 3 = 0 + 3 = 3 C[01].local_block[01][00] = 3 + 1 * 3 = 3 + 3 = 6 C[01].local_block[01][01] = 0 + 1 * 3 = 0 + 3 = 3 C[01].local_block[01][01] = 3 + 1 * 3 = 3 + 3 = 6 C[02].local_block[00][00] = 0 + 2 * 0 = 0 + 0 = 0 C[02].local_block[00][00] = 0 + 2 * 0 = 0 + 0 = 0 C[02].local_block[00][01] = 0 + 2 * 0 = 0 + 0 = 0 C[02].local_block[00][01] = 0 + 2 * 0 = 0 + 0 = 0 C[02].local_block[01][00] = 0 + 2 * 0 = 0 + 0 = 0 C[02].local_block[01][00] = 0 + 2 * 0 = 0 + 0 = 0 C[02].local_block[01][01] = 0 + 2 * 0 = 0 + 0 = 0 C[02].local_block[01][01] = 0 + 2 * 0 = 0 + 0 = 0 C[02].local_block[00][00] = 0 + 3 * 2 = 0 + 6 = 6 C[02].local_block[00][00] = 6 + 3 * 2 = 6 + 6 = 12 C[02].local_block[00][01] = 0 + 3 * 2 = 0 + 6 = 6 C[02].local_block[00][01] = 6 + 3 * 2 = 6 + 6 = 12 C[02].local_block[01][00] = 0 + 3 * 2 = 0 + 6 = 6 C[02].local_block[01][00] = 6 + 3 * 2 = 6 + 6 = 12 C[02].local_block[01][01] = 0 + 3 * 2 = 0 + 6 = 6 C[02].local_block[01][01] = 6 + 3 * 2 = 6 + 6 = 12 C[03].local_block[00][00] = 0 + 2 * 1 = 0 + 2 = 2 C[03].local_block[00][00] = 2 + 2 * 1 = 2 + 2 = 4 C[03].local_block[00][01] = 0 + 2 * 1 = 0 + 2 = 2 C[03].local_block[00][01] = 2 + 2 * 1 = 2 + 2 = 4 C[03].local_block[01][00] = 0 + 2 * 1 = 0 + 2 = 2 C[03].local_block[01][00] = 2 + 2 * 1 = 2 + 2 = 4 C[03].local_block[01][01] = 0 + 2 * 1 = 0 + 2 = 2 C[03].local_block[01][01] = 2 + 2 * 1 = 2 + 2 = 4 C[03].local_block[00][00] = 4 + 3 * 3 = 4 + 9 = 13 C[03].local_block[00][00] = 13 + 3 * 3 = 13 + 9 = 22 C[03].local_block[00][01] = 4 + 3 * 3 = 4 + 9 = 13 C[03].local_block[00][01] = 13 + 3 * 3 = 13 + 9 = 22 C[03].local_block[01][00] = 4 + 3 * 3 = 4 + 9 = 13 C[03].local_block[01][00] = 13 + 3 * 3 = 13 + 9 = 22 C[03].local_block[01][01] = 4 + 3 * 3 = 4 + 9 = 13 C[03].local_block[01][01] = 13 + 3 * 3 = 13 + 9 = 22 0 1 2 3 0 4 4 6 6 1 4 4 6 6 2 12 12 22 22 3 12 12 22 22 This is upcc (the Berkeley Unified Parallel C compiler), v. 2.4.0 (getting remote translator settings...) ----------------------+--------------------------------------------------------- UPC Runtime | v. 2.4.0, built on Oct 25 2007 at 15:37:52 ----------------------+--------------------------------------------------------- UPC-to-C translator | v. 2.4.0, built on Oct 31 2006 at 14:53:03 ----------------------+--------------------------------------------------------- Translator location | http://upc-translator.lbl.gov/upcc-2.4.0.cgi ----------------------+--------------------------------------------------------- networks supported | smp mpi ----------------------+--------------------------------------------------------- default network | mpi ----------------------+--------------------------------------------------------- pthreads support | available (if used, default is 2 pthreads per process) ----------------------+--------------------------------------------------------- Configured with | '--prefix=/usr/local/berkeley_upc-2.4.0' 'CC=mpicc' | 'MPI_CC=mpicc' '--disable-udp' | '--with-sptr-packed-bits=22,8,34' ----------------------+--------------------------------------------------------- Configure features | berkeleyupc,upcr,gasnet,upc_collective,upc_io, | upc_memcpy_async,upc_ptradd,upc_thread_distance, | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf, | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc, | upc_memcpy_vis,nodebug,notrace,nostats,nogasp, | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu, | packedsptr ----------------------+--------------------------------------------------------- Configure id | gilbert.cse.mtu.edu Thu Oct 25 15:35:50 EDT 2007 root ----------------------+--------------------------------------------------------- Binary interface | 64-bit x86_64-unknown-linux-gnu ----------------------+--------------------------------------------------------- Runtime interface # | Runtime supports 3.0 -> 3.8: Translator uses 3.6 ----------------------+--------------------------------------------------------- | --- BACKEND SETTINGS (for mpi network) --- ----------------------+--------------------------------------------------------- C compiler | /usr/local/mpi/bin/mpicc | GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3) | gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) ----------------------+--------------------------------------------------------- C compiler flags | -O3 --param max-inline-insns-single=35000 --param | inline-unit-growth=10000 --param | large-function-growth=200000 -Winline ----------------------+--------------------------------------------------------- linker | /usr/local/mpi/bin/mpicc | GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3) | gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) ----------------------+--------------------------------------------------------- linker flags | -O3 --param max-inline-insns-single=35000 --param | inline-unit-growth=10000 --param | large-function-growth=200000 -Winline | -L/usr/local/berkeley_upc-2.4.0/lib -lupcr-mpi-seq | -lumalloc -L/usr/local/berkeley_upc-2.4.0/lib | -lgasnet-mpi-seq -lammpi | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm ----------------------+--------------------------------------------------------- This is upcc (the Berkeley Unified Parallel C compiler), v. 2.6.0 (getting remote translator settings...) ----------------------+--------------------------------------------------------- UPC Runtime | v. 2.6.0, built on Mar 14 2008 at 15:11:37 ----------------------+--------------------------------------------------------- UPC-to-C translator | v. 2.6.0, built on Oct 15 2007 at 15:50:19 ----------------------+--------------------------------------------------------- Translator location | http://upc-translator.lbl.gov/upcc-2.6.0.cgi ----------------------+--------------------------------------------------------- networks supported | smp ibv ----------------------+--------------------------------------------------------- default network | ibv ----------------------+--------------------------------------------------------- pthreads support | available (if used, default is 2 pthreads per process) ----------------------+--------------------------------------------------------- Configured with | '--with-translator=http://upc-translator.lbl.gov/upcc-2 | .6.0.cgi' '--enable-ibv' '--disable-mpi' | '--disable-udp' '--with-ibv-spawner=ssh' | '--disable-mpi-compat' | '--prefix=/usr/local/berkeley_upc-2.6.0//opt' | '--with-multiconf-magic=opt' ----------------------+--------------------------------------------------------- Configure features | berkeleyupc,upcr,gasnet,upc_collective,upc_io, | upc_memcpy_async,upc_ptradd,upc_thread_distance, | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf, | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc, | upc_memcpy_vis,nodebug,notrace,nostats,nogasp, | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu, | packedsptr ----------------------+--------------------------------------------------------- Configure id | gilbert.cse.mtu.edu Fri Mar 14 15:07:10 EDT 2008 | sdvormwa ----------------------+--------------------------------------------------------- Binary interface | 64-bit x86_64-unknown-linux-gnu ----------------------+--------------------------------------------------------- Runtime interface # | Runtime supports 3.0 -> 3.9: Translator uses 3.6 ----------------------+--------------------------------------------------------- | --- BACKEND SETTINGS (for ibv network) --- ----------------------+--------------------------------------------------------- C compiler | /usr/bin/gcc | GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3) | gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) | Reading specs from | /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs | Configured with: ../configure --prefix=/usr | --mandir=/usr/share/man --infodir=/usr/share/info | --enable-shared --enable-threads=posix | --disable-checking --with-system-zlib | --enable-__cxa_atexit --disable-libunwind-exceptions | --enable-java-awt=gtk --host=x86_64-redhat-linux ----------------------+--------------------------------------------------------- C compiler flags | -O3 --param max-inline-insns-single=35000 --param | inline-unit-growth=10000 --param | large-function-growth=200000 -Winline ----------------------+--------------------------------------------------------- linker | /usr/bin/gcc | GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3) | gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) | Reading specs from | /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs | Configured with: ../configure --prefix=/usr | --mandir=/usr/share/man --infodir=/usr/share/info | --enable-shared --enable-threads=posix | --disable-checking --with-system-zlib | --enable-__cxa_atexit --disable-libunwind-exceptions | --enable-java-awt=gtk --host=x86_64-redhat-linux ----------------------+--------------------------------------------------------- linker flags | -O3 --param max-inline-insns-single=35000 --param | inline-unit-growth=10000 --param | large-function-growth=200000 -Winline | -L/usr/local/berkeley_upc-2.6.0//opt/lib -lupcr-ibv-seq | -lumalloc -L/usr/local/berkeley_upc-2.6.0//opt/lib | -L/usr/lib64 -lgasnet-ibv-seq -libverbs -lpthread | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm ----------------------+--------------------------------------------------------- This is upcc (the Berkeley Unified Parallel C compiler), v. 2.8.0 (getting remote translator settings...) ----------------------+--------------------------------------------------------- UPC Runtime | v. 2.8.0, built on Nov 20 2008 at 14:17:45 ----------------------+--------------------------------------------------------- UPC-to-C translator | v. 2.8.0, built on Nov 5 2008 at 14:09:55 | host aphid linux-x86_64/64 | gcc v4.2.4 (Ubuntu 4.2.4-1ubuntu3) ----------------------+--------------------------------------------------------- Translator location | http://upc-translator.lbl.gov/upcc-2.8.0.cgi ----------------------+--------------------------------------------------------- networks supported | smp ibv ----------------------+--------------------------------------------------------- default network | ibv ----------------------+--------------------------------------------------------- pthreads support | available (if used, default is 2 pthreads per process) ----------------------+--------------------------------------------------------- Configured with | '--with-translator=http://upc-translator.lbl.gov/upcc-2 | .8.0.cgi' '--enable-ibv' '--disable-mpi' | '--disable-udp' '--disable-mpi-compat' | '--prefix=/usr/local/berkeley_upc-2.8.0/opt' | '--with-multiconf-magic=opt' ----------------------+--------------------------------------------------------- Configure features | berkeleyupc,upcr,gasnet,upc_collective,upc_io, | upc_memcpy_async,upc_ptradd,upc_thread_distance, | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf, | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc, | upc_memcpy_vis,nodebug,notrace,nostats,nogasp, | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu, | packedsptr ----------------------+--------------------------------------------------------- Configure id | gilbert.cse.mtu.edu Thu Nov 20 14:08:40 EST 2008 | sdvormwa ----------------------+--------------------------------------------------------- Binary interface | 64-bit x86_64-unknown-linux-gnu ----------------------+--------------------------------------------------------- Runtime interface # | Runtime supports 3.0 -> 3.10: Translator uses 3.6 ----------------------+--------------------------------------------------------- | --- BACKEND SETTINGS (for ibv network) --- ----------------------+--------------------------------------------------------- C compiler | /usr/bin/gcc | GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3) | gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) | Reading specs from | /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs | Configured with: ../configure --prefix=/usr | --mandir=/usr/share/man --infodir=/usr/share/info | --enable-shared --enable-threads=posix | --disable-checking --with-system-zlib | --enable-__cxa_atexit --disable-libunwind-exceptions | --enable-java-awt=gtk --host=x86_64-redhat-linux ----------------------+--------------------------------------------------------- C compiler flags | -O3 --param max-inline-insns-single=35000 --param | inline-unit-growth=10000 --param | large-function-growth=200000 -Winline ----------------------+--------------------------------------------------------- linker | /usr/bin/gcc | GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3) | gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) | Reading specs from | /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs | Configured with: ../configure --prefix=/usr | --mandir=/usr/share/man --infodir=/usr/share/info | --enable-shared --enable-threads=posix | --disable-checking --with-system-zlib | --enable-__cxa_atexit --disable-libunwind-exceptions | --enable-java-awt=gtk --host=x86_64-redhat-linux ----------------------+--------------------------------------------------------- linker flags | -O3 --param max-inline-insns-single=35000 --param | inline-unit-growth=10000 --param | large-function-growth=200000 -Winline | -L/usr/local/berkeley_upc-2.8.0/opt/lib -lupcr-ibv-seq | -lumalloc -L/usr/local/berkeley_upc-2.8.0/opt/lib | -L/usr/lib64 -lgasnet-ibv-seq -libverbs -lpthread | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm ----------------------+---------------------------------------------------------