Odd problem with shared array

From: Steven Vormwald (sdvormwa_at_mtu_dot_edu)
Date: Mon Jan 26 2009 - 15:32:23 PST

  • Next message: Paul H. Hargrove: "Re: Odd problem with shared array"
    Hello,
    
    I've come across an odd problem that seems to only come up with structs 
    with 2-dimensional arrays of size 1x1.  The attached code provides an 
    example of this.  When run with N=1 (and 4 threads), the output is 
    unexpectedly:
            0       1
    0       0       1
    1       2       3
    
            0       1
    0       0       1
    1       2       3
    
            0       1
    0       0       704643072
    1       2752512 10752
    
    instead of
    
            0       1
    0       0       1
    1       2       3
    
            0       1
    0       0       1
    1       2       3
    
            0       1
    0       2       3
    1       6       11
    
    Using any value of N other than 1 generates the correct output for the 
    number of threads.  Even more odd is when I enabled the debugging output 
    from the code:
            0       1
    0       0       1
    1       2       3
    
            0       1
    0       0       1
    1       2       3
    
    C[00].local_block[00][00] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[00][00] = 0 + 1 * 2
                              = 0 + 0
                              = 0
    C[01].local_block[00][00] = 0 + 0 * 1
                              = 0 + 0
                              = 704643072
    C[01].local_block[00][00] = 704643072 + 1 * 3
                              = 704643072 + 0
                              = 704643072
    C[02].local_block[00][00] = 0 + 2 * 0
                              = 0 + 0
                              = 2752512
    C[02].local_block[00][00] = 2752512 + 3 * 2
                              = 2752512 + 0
                              = 2752512
    C[03].local_block[00][00] = 0 + 2 * 1
                              = 0 + 0
                              = 10752
    C[03].local_block[00][00] = 10752 + 3 * 3
                              = 10752 + 0
                              = 10752
    
            0       1
    0       0       704643072
    1       2752512 10752
    
    Note that the values of A[] and B[] are printed correctly on the first 
    line, but the results of the multiplication and store in C[] are incorrect.
    
    Changing the code to use floats or doubles instead of ints generates 
    similar problems.  However, if the arrays are allocated dynamically with 
    upc_all_alloc(), the program works correctly.  I tested the code on 
    versions 2.4 (mpi), 2.6 (smp,ibv), and 2.8 (smp,ibv) of the Berkeley UPC 
    compiler, all of which produce the same problem.  I haven't been able to 
    test it on another machine, so it might be a configuration issue, or a 
    problem with the local C compiler (gcc (GCC) 3.4.6 20060404 (Red Hat 
    3.4.6-3)).
    
    Attached is the source code that was used, the output of 'upcc -version' 
    for each of the versions of the compiler used, as well as the output run 
    on 4 threads with N=1,2.  I fixed the order of the output lines so they 
    lined up properly, but otherwise did not change the output.
    
    Steven Vormwald
    
    	0	1
    0	0	1
    1	2	3
    
    	0	1
    0	0	1
    1	2	3
    
    C[00].local_block[00][00] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[00][00] = 0 + 1 * 2
                              = 0 + 0
                              = 0
    C[01].local_block[00][00] = 0 + 0 * 1
                              = 0 + 0
                              = 704643072
    C[01].local_block[00][00] = 704643072 + 1 * 3
                              = 704643072 + 0
                              = 704643072
    C[02].local_block[00][00] = 0 + 2 * 0
                              = 0 + 0
                              = 2752512
    C[02].local_block[00][00] = 2752512 + 3 * 2
                              = 2752512 + 0
                              = 2752512
    C[03].local_block[00][00] = 0 + 2 * 1
                              = 0 + 0
                              = 10752
    C[03].local_block[00][00] = 10752 + 3 * 3
                              = 10752 + 0
                              = 10752
    
    	0	1
    0	0	704643072
    1	2752512	10752
    
    
    	0	1	2	3
    0	0	0	1	1
    1	0	0	1	1
    2	2	2	3	3
    3	2	2	3	3
    
    	0	1	2	3
    0	0	0	1	1
    1	0	0	1	1
    2	2	2	3	3
    3	2	2	3	3
    
    C[00].local_block[00][00] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[00][00] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[00][01] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[00][01] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[01][00] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[01][00] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[01][01] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[01][01] = 0 + 0 * 0
                              = 0 + 0
                              = 0
    C[00].local_block[00][00] = 0 + 1 * 2
                              = 0 + 2
                              = 2
    C[00].local_block[00][00] = 2 + 1 * 2
                              = 2 + 2
                              = 4
    C[00].local_block[00][01] = 0 + 1 * 2
                              = 0 + 2
                              = 2
    C[00].local_block[00][01] = 2 + 1 * 2
                              = 2 + 2
                              = 4
    C[00].local_block[01][00] = 0 + 1 * 2
                              = 0 + 2
                              = 2
    C[00].local_block[01][00] = 2 + 1 * 2
                              = 2 + 2
                              = 4
    C[00].local_block[01][01] = 0 + 1 * 2
                              = 0 + 2
                              = 2
    C[00].local_block[01][01] = 2 + 1 * 2
                              = 2 + 2
                              = 4
    
    C[01].local_block[00][00] = 0 + 0 * 1
                              = 0 + 0
                              = 0
    C[01].local_block[00][00] = 0 + 0 * 1
                              = 0 + 0
                              = 0
    C[01].local_block[00][01] = 0 + 0 * 1
                              = 0 + 0
                              = 0
    C[01].local_block[00][01] = 0 + 0 * 1
                              = 0 + 0
                              = 0
    C[01].local_block[01][00] = 0 + 0 * 1
                              = 0 + 0
                              = 0
    C[01].local_block[01][00] = 0 + 0 * 1
                              = 0 + 0
                              = 0
    C[01].local_block[01][01] = 0 + 0 * 1
                              = 0 + 0
                              = 0
    C[01].local_block[01][01] = 0 + 0 * 1
                              = 0 + 0
                              = 0
    C[01].local_block[00][00] = 0 + 1 * 3
                              = 0 + 3
                              = 3
    C[01].local_block[00][00] = 3 + 1 * 3
                              = 3 + 3
                              = 6
    C[01].local_block[00][01] = 0 + 1 * 3
                              = 0 + 3
                              = 3
    C[01].local_block[00][01] = 3 + 1 * 3
                              = 3 + 3
                              = 6
    C[01].local_block[01][00] = 0 + 1 * 3
                              = 0 + 3
                              = 3
    C[01].local_block[01][00] = 3 + 1 * 3
                              = 3 + 3
                              = 6
    C[01].local_block[01][01] = 0 + 1 * 3
                              = 0 + 3
                              = 3
    C[01].local_block[01][01] = 3 + 1 * 3
                              = 3 + 3
                              = 6
    
    C[02].local_block[00][00] = 0 + 2 * 0
                              = 0 + 0
                              = 0
    C[02].local_block[00][00] = 0 + 2 * 0
                              = 0 + 0
                              = 0
    C[02].local_block[00][01] = 0 + 2 * 0
                              = 0 + 0
                              = 0
    C[02].local_block[00][01] = 0 + 2 * 0
                              = 0 + 0
                              = 0
    C[02].local_block[01][00] = 0 + 2 * 0
                              = 0 + 0
                              = 0
    C[02].local_block[01][00] = 0 + 2 * 0
                              = 0 + 0
                              = 0
    C[02].local_block[01][01] = 0 + 2 * 0
                              = 0 + 0
                              = 0
    C[02].local_block[01][01] = 0 + 2 * 0
                              = 0 + 0
                              = 0
    C[02].local_block[00][00] = 0 + 3 * 2
                              = 0 + 6
                              = 6
    C[02].local_block[00][00] = 6 + 3 * 2
                              = 6 + 6
                              = 12
    C[02].local_block[00][01] = 0 + 3 * 2
                              = 0 + 6
                              = 6
    C[02].local_block[00][01] = 6 + 3 * 2
                              = 6 + 6
                              = 12
    C[02].local_block[01][00] = 0 + 3 * 2
                              = 0 + 6
                              = 6
    C[02].local_block[01][00] = 6 + 3 * 2
                              = 6 + 6
                              = 12
    C[02].local_block[01][01] = 0 + 3 * 2
                              = 0 + 6
                              = 6
    C[02].local_block[01][01] = 6 + 3 * 2
                              = 6 + 6
                              = 12
    
    C[03].local_block[00][00] = 0 + 2 * 1
                              = 0 + 2
                              = 2
    C[03].local_block[00][00] = 2 + 2 * 1
                              = 2 + 2
                              = 4
    C[03].local_block[00][01] = 0 + 2 * 1
                              = 0 + 2
                              = 2
    C[03].local_block[00][01] = 2 + 2 * 1
                              = 2 + 2
                              = 4
    C[03].local_block[01][00] = 0 + 2 * 1
                              = 0 + 2
                              = 2
    C[03].local_block[01][00] = 2 + 2 * 1
                              = 2 + 2
                              = 4
    C[03].local_block[01][01] = 0 + 2 * 1
                              = 0 + 2
                              = 2
    C[03].local_block[01][01] = 2 + 2 * 1
                              = 2 + 2
                              = 4
    C[03].local_block[00][00] = 4 + 3 * 3
                              = 4 + 9
                              = 13
    C[03].local_block[00][00] = 13 + 3 * 3
                              = 13 + 9
                              = 22
    C[03].local_block[00][01] = 4 + 3 * 3
                              = 4 + 9
                              = 13
    C[03].local_block[00][01] = 13 + 3 * 3
                              = 13 + 9
                              = 22
    C[03].local_block[01][00] = 4 + 3 * 3
                              = 4 + 9
                              = 13
    C[03].local_block[01][00] = 13 + 3 * 3
                              = 13 + 9
                              = 22
    C[03].local_block[01][01] = 4 + 3 * 3
                              = 4 + 9
                              = 13
    C[03].local_block[01][01] = 13 + 3 * 3
                              = 13 + 9
                              = 22
    
    	0	1	2	3
    0	4	4	6	6
    1	4	4	6	6
    2	12	12	22	22
    3	12	12	22	22
    
    
    This is upcc (the Berkeley Unified Parallel C compiler), v. 2.4.0
      (getting remote translator settings...)
    ----------------------+---------------------------------------------------------
     UPC Runtime          | v. 2.4.0, built on Oct 25 2007 at 15:37:52
    ----------------------+---------------------------------------------------------
     UPC-to-C translator  | v. 2.4.0, built on Oct 31 2006 at 14:53:03
    ----------------------+---------------------------------------------------------
     Translator location  | http://upc-translator.lbl.gov/upcc-2.4.0.cgi
    ----------------------+---------------------------------------------------------
     networks supported   | smp mpi
    ----------------------+---------------------------------------------------------
     default network      | mpi
    ----------------------+---------------------------------------------------------
     pthreads support     | available (if used, default is 2 pthreads per process)
    ----------------------+---------------------------------------------------------
     Configured with      | '--prefix=/usr/local/berkeley_upc-2.4.0' 'CC=mpicc'
                          | 'MPI_CC=mpicc' '--disable-udp'
                          | '--with-sptr-packed-bits=22,8,34'
    ----------------------+---------------------------------------------------------
     Configure features   | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
                          | upc_memcpy_async,upc_ptradd,upc_thread_distance,
                          | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
                          | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
                          | upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
                          | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
                          | packedsptr
    ----------------------+---------------------------------------------------------
     Configure id         | gilbert.cse.mtu.edu Thu Oct 25 15:35:50 EDT 2007 root
    ----------------------+---------------------------------------------------------
     Binary interface     | 64-bit x86_64-unknown-linux-gnu
    ----------------------+---------------------------------------------------------
     Runtime interface #  | Runtime supports 3.0 -> 3.8: Translator uses 3.6
    ----------------------+---------------------------------------------------------
                          |  --- BACKEND SETTINGS (for mpi network) ---
    ----------------------+---------------------------------------------------------
     C compiler           | /usr/local/mpi/bin/mpicc
                          |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
    ----------------------+---------------------------------------------------------
     C compiler flags     | -O3 --param max-inline-insns-single=35000 --param
                          | inline-unit-growth=10000 --param
                          | large-function-growth=200000 -Winline
    ----------------------+---------------------------------------------------------
     linker               | /usr/local/mpi/bin/mpicc
                          |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
    ----------------------+---------------------------------------------------------
     linker flags         | -O3 --param max-inline-insns-single=35000 --param
                          | inline-unit-growth=10000 --param
                          | large-function-growth=200000 -Winline
                          | -L/usr/local/berkeley_upc-2.4.0/lib -lupcr-mpi-seq
                          | -lumalloc -L/usr/local/berkeley_upc-2.4.0/lib
                          | -lgasnet-mpi-seq -lammpi
                          | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
    ----------------------+---------------------------------------------------------
    
    This is upcc (the Berkeley Unified Parallel C compiler), v. 2.6.0
      (getting remote translator settings...)
    ----------------------+---------------------------------------------------------
     UPC Runtime          | v. 2.6.0, built on Mar 14 2008 at 15:11:37
    ----------------------+---------------------------------------------------------
     UPC-to-C translator  | v. 2.6.0, built on Oct 15 2007 at 15:50:19
    ----------------------+---------------------------------------------------------
     Translator location  | http://upc-translator.lbl.gov/upcc-2.6.0.cgi
    ----------------------+---------------------------------------------------------
     networks supported   | smp ibv
    ----------------------+---------------------------------------------------------
     default network      | ibv
    ----------------------+---------------------------------------------------------
     pthreads support     | available (if used, default is 2 pthreads per process)
    ----------------------+---------------------------------------------------------
     Configured with      | '--with-translator=http://upc-translator.lbl.gov/upcc-2
                          | .6.0.cgi' '--enable-ibv' '--disable-mpi'
                          | '--disable-udp' '--with-ibv-spawner=ssh'
                          | '--disable-mpi-compat'
                          | '--prefix=/usr/local/berkeley_upc-2.6.0//opt'
                          | '--with-multiconf-magic=opt'
    ----------------------+---------------------------------------------------------
     Configure features   | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
                          | upc_memcpy_async,upc_ptradd,upc_thread_distance,
                          | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
                          | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
                          | upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
                          | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
                          | packedsptr
    ----------------------+---------------------------------------------------------
     Configure id         | gilbert.cse.mtu.edu Fri Mar 14 15:07:10 EDT 2008
                          | sdvormwa
    ----------------------+---------------------------------------------------------
     Binary interface     | 64-bit x86_64-unknown-linux-gnu
    ----------------------+---------------------------------------------------------
     Runtime interface #  | Runtime supports 3.0 -> 3.9: Translator uses 3.6
    ----------------------+---------------------------------------------------------
                          |  --- BACKEND SETTINGS (for ibv network) ---
    ----------------------+---------------------------------------------------------
     C compiler           | /usr/bin/gcc
                          |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   Reading specs from
                          |   /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
                          |   Configured with: ../configure --prefix=/usr
                          |   --mandir=/usr/share/man --infodir=/usr/share/info
                          |   --enable-shared --enable-threads=posix
                          |   --disable-checking --with-system-zlib
                          |   --enable-__cxa_atexit --disable-libunwind-exceptions
                          |   --enable-java-awt=gtk --host=x86_64-redhat-linux
    ----------------------+---------------------------------------------------------
     C compiler flags     | -O3 --param max-inline-insns-single=35000 --param
                          | inline-unit-growth=10000 --param
                          | large-function-growth=200000 -Winline
    ----------------------+---------------------------------------------------------
     linker               | /usr/bin/gcc
                          |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   Reading specs from
                          |   /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
                          |   Configured with: ../configure --prefix=/usr
                          |   --mandir=/usr/share/man --infodir=/usr/share/info
                          |   --enable-shared --enable-threads=posix
                          |   --disable-checking --with-system-zlib
                          |   --enable-__cxa_atexit --disable-libunwind-exceptions
                          |   --enable-java-awt=gtk --host=x86_64-redhat-linux
    ----------------------+---------------------------------------------------------
     linker flags         | -O3 --param max-inline-insns-single=35000 --param
                          | inline-unit-growth=10000 --param
                          | large-function-growth=200000 -Winline
                          | -L/usr/local/berkeley_upc-2.6.0//opt/lib -lupcr-ibv-seq
                          | -lumalloc -L/usr/local/berkeley_upc-2.6.0//opt/lib
                          | -L/usr/lib64 -lgasnet-ibv-seq -libverbs -lpthread
                          | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
    ----------------------+---------------------------------------------------------
    
    This is upcc (the Berkeley Unified Parallel C compiler), v. 2.8.0
      (getting remote translator settings...)
    ----------------------+---------------------------------------------------------
     UPC Runtime          | v. 2.8.0, built on Nov 20 2008 at 14:17:45
    ----------------------+---------------------------------------------------------
     UPC-to-C translator  | v. 2.8.0, built on Nov  5 2008 at 14:09:55
                          | host aphid linux-x86_64/64
                          | gcc v4.2.4 (Ubuntu 4.2.4-1ubuntu3)
    ----------------------+---------------------------------------------------------
     Translator location  | http://upc-translator.lbl.gov/upcc-2.8.0.cgi
    ----------------------+---------------------------------------------------------
     networks supported   | smp ibv
    ----------------------+---------------------------------------------------------
     default network      | ibv
    ----------------------+---------------------------------------------------------
     pthreads support     | available (if used, default is 2 pthreads per process)
    ----------------------+---------------------------------------------------------
     Configured with      | '--with-translator=http://upc-translator.lbl.gov/upcc-2
                          | .8.0.cgi' '--enable-ibv' '--disable-mpi'
                          | '--disable-udp' '--disable-mpi-compat'
                          | '--prefix=/usr/local/berkeley_upc-2.8.0/opt'
                          | '--with-multiconf-magic=opt'
    ----------------------+---------------------------------------------------------
     Configure features   | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
                          | upc_memcpy_async,upc_ptradd,upc_thread_distance,
                          | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
                          | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
                          | upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
                          | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
                          | packedsptr
    ----------------------+---------------------------------------------------------
     Configure id         | gilbert.cse.mtu.edu Thu Nov 20 14:08:40 EST 2008
                          | sdvormwa
    ----------------------+---------------------------------------------------------
     Binary interface     | 64-bit x86_64-unknown-linux-gnu
    ----------------------+---------------------------------------------------------
     Runtime interface #  | Runtime supports 3.0 -> 3.10: Translator uses 3.6
    ----------------------+---------------------------------------------------------
                          |  --- BACKEND SETTINGS (for ibv network) ---
    ----------------------+---------------------------------------------------------
     C compiler           | /usr/bin/gcc
                          |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   Reading specs from
                          |   /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
                          |   Configured with: ../configure --prefix=/usr
                          |   --mandir=/usr/share/man --infodir=/usr/share/info
                          |   --enable-shared --enable-threads=posix
                          |   --disable-checking --with-system-zlib
                          |   --enable-__cxa_atexit --disable-libunwind-exceptions
                          |   --enable-java-awt=gtk --host=x86_64-redhat-linux
    ----------------------+---------------------------------------------------------
     C compiler flags     | -O3 --param max-inline-insns-single=35000 --param
                          | inline-unit-growth=10000 --param
                          | large-function-growth=200000 -Winline
    ----------------------+---------------------------------------------------------
     linker               | /usr/bin/gcc
                          |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
                          |   Reading specs from
                          |   /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
                          |   Configured with: ../configure --prefix=/usr
                          |   --mandir=/usr/share/man --infodir=/usr/share/info
                          |   --enable-shared --enable-threads=posix
                          |   --disable-checking --with-system-zlib
                          |   --enable-__cxa_atexit --disable-libunwind-exceptions
                          |   --enable-java-awt=gtk --host=x86_64-redhat-linux
    ----------------------+---------------------------------------------------------
     linker flags         | -O3 --param max-inline-insns-single=35000 --param
                          | inline-unit-growth=10000 --param
                          | large-function-growth=200000 -Winline
                          | -L/usr/local/berkeley_upc-2.8.0/opt/lib -lupcr-ibv-seq
                          | -lumalloc -L/usr/local/berkeley_upc-2.8.0/opt/lib
                          | -L/usr/lib64 -lgasnet-ibv-seq -libverbs -lpthread
                          | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
    ----------------------+---------------------------------------------------------
    
    


  • Next message: Paul H. Hargrove: "Re: Odd problem with shared array"