From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Apr 14 2010 - 08:03:56 PDT
Nikita, I don't have time now to look at your problem in detail. However, I thought I'd take a moment to let you know that I am generally distrustful of timing in VMs. I recall you are using VMWare, which I have not used in many years. However, my experience with Xen is that under heavy load the guest kernel is sometimes not even capable of keeping an accurate clock. The problem is bad enough that ntpd is unable to correct for the problems. So, I think that any work related to performance measurement should be done only on real hardware. -Paul Nikita Andreev wrote: > Hello, > > I'm doing some research on home made 2 node cluster. Actually each > node is 2-way virtual machine running on one host's system dual > core processor. > > I'm testing time synchronization algorithm originally developed by PPW > team (thanks them for support). This code (see attachment) works > perfect on physical cluster. When I run it on VMs it shows wrong > results. In attached application I sync all threads to thread 0 two > times. But sometimes it turns out that time on syncing thread (which > also was distributed to the other node than thread 0) has gone ahead > of master thread 0. > > One of the results: > UPCR: UPC thread 0 of 4 on node1 (process 0 of 4, pid=13836) > UPCR: UPC thread 3 of 4 on node2 (process 3 of 4, pid=24125) > UPCR: UPC thread 2 of 4 on node2 (process 2 of 4, pid=24119) > UPCR: UPC thread 1 of 4 on node1 (process 1 of 4, pid=13839) > #1 local 10.550069 remote 10.550072 > #3 local 14.693299 remote 10.515528 > #0 local 0.000000 remote 0.000000 > #2 local 14.659530 remote 10.440920 > > As you can see time elapsed between time measurements on thread #3 is > 14.7sec and on master thread 10.5sec. These measurements (mt and et > variables) happen at the same moment and must be equal. Timings for > thread 2 is also wrong and ok for thread 1 since it's on the same node. > > I can't comprehend why this is happening. Maybe processor > virtualization brakes timers? > > I would greatly appreciate any suggestions and I'm ready to do any > tests to find out the source of the problem. > > Regards, > Nikita -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory