From: Nikita Andreev (nik_at_kemsu.ru)
Date: Tue Mar 23 2010 - 02:55:59 PDT
Paul,
I've ran this test several times and here what I've got:
Get 1 tick: 2797ns. 1 convert: 2ns
Get 1 tick: 2383ns. 1 convert: 4ns
Get 1 tick: 82ns. 1 convert: 2ns
Get 1 tick: 2137ns. 1 convert: 2ns
Get 1 tick: 2861ns. 1 convert: 2ns
Get 1 tick: 2773ns. 1 convert: 2ns
Get 1 tick: 90ns. 1 convert: 3ns
Get 1 tick: 2595ns. 1 convert: 3ns
Get 1 tick: 2881ns. 1 convert: 2ns
Get 1 tick: 2849ns. 1 convert: 3ns
I'm running CentOS 5 on VMware Guest OS which is being run under VMware ESX
Server 3 with Intel Xeon E5345 @ 2.33GHz CPUs on board.
Physical CentOS 5 host with Intel Xeon E5410 @ 2.33GHz CPUs produces
following results:
Get 1 tick: 38ns. 1 convert: 2ns
Get 1 tick: 38ns. 1 convert: 2ns
Get 1 tick: 38ns. 1 convert: 2ns
Get 1 tick: 38ns. 1 convert: 2ns
Get 1 tick: 38ns. 1 convert: 2ns
Get 1 tick: 38ns. 1 convert: 2ns
Get 1 tick: 38ns. 1 convert: 2ns
Get 1 tick: 39ns. 1 convert: 2ns
Get 1 tick: 38ns. 1 convert: 2ns
Get 1 tick: 38ns. 1 convert: 2ns
Anyway, the reason why I'm asking is the thing you said in one of the former
letters (from 27th of February):
"... If you are concerned about the costs (time for conversion or space for
storage) then it may be possible that one could record "raw" ticks in a
trace file plus enough additional information to allow ticks-to-ns
conversion to be performed outside of the UPC code (e.g. by a tool that
processes the trace file). ..."
Are we agreed upon the fact that tick to sec conversion produces nearly no
overhead and there is no need to write timestamps in ticks? Let's say we are
using doubles to store converted ticks to sec value. Double has 6 digits
after the point. It means that digits less than 1 microsecond are discarded.
Hence the only target of perturbation are VMs. And moreover the cause of
this perturbation is not the conversion.
The next point is about the space for storage. bupc_ticks_t is basically a
uin64_t. double is also a 64 bit type.
So it seems that tick to sec conversion in first place is simple and
reliable, isn't it?
Nikita
-----Original Message-----
From: owner-upc-users_at_lbl_dot_gov [mailto:owner-upc-users_at_lbl_dot_gov] On Behalf Of
Paul H. Hargrove
Sent: Tuesday, March 23, 2010 2:48 PM
To: Nikita Andreev
Cc: upc-users_at_lbl_dot_gov
Subject: Re: Expense of BUPC timer functions
Hello, Nikita.
The fact that you are using a virtual machine is probably not perturbing
the cost of bupc_ticks_now(), at least not to the extent you report
seeing. Assuming a modern AMD or Intel CPU, this function is using the
RDTCS instruction on most OSes, which should be quite cheap.
The observation that the first call to bupc_ticks_to_ns() is more
expensive than later calls is to be expected. The first instance parses
/proc/cpuinfo to get the clock rate and stores the value for reuse in
subsequent calls.
Running on a 2.3GHz Intel Xeon E5410 from a Xen Dom0 kernel, the output
from your attached program is
Get 1 tick: 33ns. 1 convert: 2ns
I tried on a Xen HVM DomU running on the same machine:
Get 1 tick: 34ns. 1 convert: 2ns
And on a Xen PV DomU on the same machine:
Get 1 tick: 33ns. 1 convert: 2ns
So virtualization is probably not a significant factor, at least under Xen
On an older 2.2GHz Opteron which is not running Xen I see
Get 1 tick: 6ns. 1 convert: 2ns
And an old 2.8Ghz Pentium-4 yields
Get 1 tick: 82ns. 1 convert: 4ns
So, there can be significant variation among platforms, and I don't know
if the 33-vs-6 difference is Xen related or not.
So, I will agree with you that the relative cost of query and conversion
are not ordered as the documentation suggests.
However, I can't reproduce the 2384ns query overhead.
If you can tell me more about the platform you are running on perhaps I
could understand the extraordinarily high query cost you report.
-Paul
Nikita Andreev wrote:
>
> Hello Paul and all,
>
> I'm measuring the overhead of bupc_ticks_now() and bupc_ticks_to_ns()
> and results doesn't look like I expected. Find the test attached.
>
> I've made 1 million iterations and have got the following:
>
> bupc_ticks_now: 2383ns
>
> bupc_ticks_to_ns: 4ns
>
> So the conversion is made lot faster than query. But documentation
> says: The bupc_ticks_to_{us,ns}() conversion calls can be
> significantly more expensive than the bupc_ticks_now() tick query.
>
> What I also noticed is that first bupc_tick_to_ns call in a loop is
> very slow. It can take even 1,5 milliseconds. And almost all others
> are very fast.
>
> What am I missing here?
>
> P.S. I've performed this test on a virtual machine if it makes any
> difference.
>
> Regards,
>
> Nikita Andreev
>
--
Paul H. Hargrove PHHargrove_at_lbl_dot_gov
Future Technologies Group Tel: +1-510-495-2352
HPC Research Department Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory