bupc timing on VMs

From: Nikita Andreev (lestat_at_kemsu.ru)
Date: Wed Apr 14 2010 - 01:17:19 PDT

  • Next message: Nikita Andreev: "Mailing list archive doesn't work"
    Hello,
    
    I'm doing some research on home made 2 node cluster. Actually each node is 2-way virtual machine running on one host's system dual core processor.
    
    I'm testing time synchronization algorithm originally developed by PPW team (thanks them for support). This code (see attachment) works perfect on physical cluster. When I run it on VMs it shows wrong results. In attached application I sync all threads to thread 0 two times. But sometimes it turns out that time on syncing thread (which also was distributed to the other node than thread 0) has gone ahead of master thread 0.
    
    One of the results:
    UPCR: UPC thread 0 of 4 on node1 (process 0 of 4, pid=13836)
    UPCR: UPC thread 3 of 4 on node2 (process 3 of 4, pid=24125)
    UPCR: UPC thread 2 of 4 on node2 (process 2 of 4, pid=24119)
    UPCR: UPC thread 1 of 4 on node1 (process 1 of 4, pid=13839)
    #1 local 10.550069 remote 10.550072
    #3 local 14.693299 remote 10.515528
    #0 local 0.000000 remote 0.000000
    #2 local 14.659530 remote 10.440920
    
    As you can see time elapsed between time measurements on thread #3 is 14.7sec and on master thread 10.5sec. These measurements (mt and et variables) happen at the same moment and must be equal. Timings for thread 2 is also wrong and ok for thread 1 since it's on the same node.
    
    I can't comprehend why this is happening. Maybe processor virtualization brakes timers?
    
    I would greatly appreciate any suggestions and I'm ready to do any tests to find out the source of the problem.
    
    Regards,
    Nikita
    
    
    
    


  • Next message: Nikita Andreev: "Mailing list archive doesn't work"