Re: bupc timing on VMs

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Apr 14 2010 - 08:03:56 PDT

  • Next message: Nikita Andreev: "Re: bupc timing on VMs"
    Nikita,
    
    I don't have time now to look at your problem in detail.  However, I 
    thought I'd take a moment to let you know that I am generally 
    distrustful of timing in VMs.
    
    I recall you are using VMWare, which I have not used in many years.  
    However, my experience with Xen is that under heavy load the guest 
    kernel is sometimes not even capable of keeping an accurate clock.  The 
    problem is bad enough that ntpd is unable to correct for the problems.  
    So, I think that any work related to performance measurement should be 
    done only on real hardware.
    
    -Paul
    
    Nikita Andreev wrote:
    > Hello,
    >  
    > I'm doing some research on home made 2 node cluster. Actually each 
    > node is 2-way virtual machine running on one host's system dual 
    > core processor.
    >  
    > I'm testing time synchronization algorithm originally developed by PPW 
    > team (thanks them for support). This code (see attachment) works 
    > perfect on physical cluster. When I run it on VMs it shows wrong 
    > results. In attached application I sync all threads to thread 0 two 
    > times. But sometimes it turns out that time on syncing thread (which 
    > also was distributed to the other node than thread 0) has gone ahead 
    > of master thread 0.
    >  
    > One of the results:
    > UPCR: UPC thread 0 of 4 on node1 (process 0 of 4, pid=13836)
    > UPCR: UPC thread 3 of 4 on node2 (process 3 of 4, pid=24125)
    > UPCR: UPC thread 2 of 4 on node2 (process 2 of 4, pid=24119)
    > UPCR: UPC thread 1 of 4 on node1 (process 1 of 4, pid=13839)
    > #1 local 10.550069 remote 10.550072
    > #3 local 14.693299 remote 10.515528
    > #0 local 0.000000 remote 0.000000
    > #2 local 14.659530 remote 10.440920
    >  
    > As you can see time elapsed between time measurements on thread #3 is 
    > 14.7sec and on master thread 10.5sec. These measurements (mt and et 
    > variables) happen at the same moment and must be equal. Timings for 
    > thread 2 is also wrong and ok for thread 1 since it's on the same node.
    >  
    > I can't comprehend why this is happening. Maybe processor 
    > virtualization brakes timers?
    >  
    > I would greatly appreciate any suggestions and I'm ready to do any 
    > tests to find out the source of the problem.
    >  
    > Regards,
    > Nikita
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Nikita Andreev: "Re: bupc timing on VMs"