From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Mar 23 2010 - 13:31:45 PDT
Nikita, The performance and internal representation of the ticks-related code are platform dependent. If you are only (or just primarily) interested in Linux on x86 and x86-64 then I would agree with you that conversion is cheap. However, there do exist other platforms where a "tick" is a structure of two 32-bit fields (AIX and IRIX) and conversion is more costly as a result. I also agree that storing "raw" ticks vs "double"s uses the same storage (BTW: a 64-bit IEEE double has 15 decimal digits of precision, even if the default printf format shows only 6). So, I think it would be fine if you want to perform the conversion at runtime rather than via post processing as I had suggested previously. While not doing the conversion is cheaper than doing it, you get simpler more portable/maintainable code if you perform the conversion at runtime. I am surprised that VMware has such a high average case cost for the query. The code you attached previously runs for 1 million iteration. Perhaps with 1 million iterations your process (or even the entire VM) is getting de-scheduled and affecting your measurement. Do you still see results in the 2500ns range if you lower the iteration count to something like 10000? -Paul Nikita Andreev wrote: > Paul, > > I've ran this test several times and here what I've got: > Get 1 tick: 2797ns. 1 convert: 2ns > Get 1 tick: 2383ns. 1 convert: 4ns > Get 1 tick: 82ns. 1 convert: 2ns > Get 1 tick: 2137ns. 1 convert: 2ns > Get 1 tick: 2861ns. 1 convert: 2ns > Get 1 tick: 2773ns. 1 convert: 2ns > Get 1 tick: 90ns. 1 convert: 3ns > Get 1 tick: 2595ns. 1 convert: 3ns > Get 1 tick: 2881ns. 1 convert: 2ns > Get 1 tick: 2849ns. 1 convert: 3ns > > I'm running CentOS 5 on VMware Guest OS which is being run under VMware ESX > Server 3 with Intel Xeon E5345 @ 2.33GHz CPUs on board. > > Physical CentOS 5 host with Intel Xeon E5410 @ 2.33GHz CPUs produces > following results: > Get 1 tick: 38ns. 1 convert: 2ns > Get 1 tick: 38ns. 1 convert: 2ns > Get 1 tick: 38ns. 1 convert: 2ns > Get 1 tick: 38ns. 1 convert: 2ns > Get 1 tick: 38ns. 1 convert: 2ns > Get 1 tick: 38ns. 1 convert: 2ns > Get 1 tick: 38ns. 1 convert: 2ns > Get 1 tick: 39ns. 1 convert: 2ns > Get 1 tick: 38ns. 1 convert: 2ns > Get 1 tick: 38ns. 1 convert: 2ns > > > Anyway, the reason why I'm asking is the thing you said in one of the former > letters (from 27th of February): > "... If you are concerned about the costs (time for conversion or space for > storage) then it may be possible that one could record "raw" ticks in a > trace file plus enough additional information to allow ticks-to-ns > conversion to be performed outside of the UPC code (e.g. by a tool that > processes the trace file). ..." > > Are we agreed upon the fact that tick to sec conversion produces nearly no > overhead and there is no need to write timestamps in ticks? Let's say we are > using doubles to store converted ticks to sec value. Double has 6 digits > after the point. It means that digits less than 1 microsecond are discarded. > Hence the only target of perturbation are VMs. And moreover the cause of > this perturbation is not the conversion. > > The next point is about the space for storage. bupc_ticks_t is basically a > uin64_t. double is also a 64 bit type. > > So it seems that tick to sec conversion in first place is simple and > reliable, isn't it? > > Nikita > > > -----Original Message----- > From: owner-upc-users_at_lbl_dot_gov [mailto:owner-upc-users_at_lbl_dot_gov] On Behalf Of > Paul H. Hargrove > Sent: Tuesday, March 23, 2010 2:48 PM > To: Nikita Andreev > Cc: upc-users_at_lbl_dot_gov > Subject: Re: Expense of BUPC timer functions > > Hello, Nikita. > > The fact that you are using a virtual machine is probably not perturbing > the cost of bupc_ticks_now(), at least not to the extent you report > seeing. Assuming a modern AMD or Intel CPU, this function is using the > RDTCS instruction on most OSes, which should be quite cheap. > > The observation that the first call to bupc_ticks_to_ns() is more > expensive than later calls is to be expected. The first instance parses > /proc/cpuinfo to get the clock rate and stores the value for reuse in > subsequent calls. > > Running on a 2.3GHz Intel Xeon E5410 from a Xen Dom0 kernel, the output > from your attached program is > Get 1 tick: 33ns. 1 convert: 2ns > I tried on a Xen HVM DomU running on the same machine: > Get 1 tick: 34ns. 1 convert: 2ns > And on a Xen PV DomU on the same machine: > Get 1 tick: 33ns. 1 convert: 2ns > So virtualization is probably not a significant factor, at least under Xen > > On an older 2.2GHz Opteron which is not running Xen I see > Get 1 tick: 6ns. 1 convert: 2ns > And an old 2.8Ghz Pentium-4 yields > Get 1 tick: 82ns. 1 convert: 4ns > So, there can be significant variation among platforms, and I don't know > if the 33-vs-6 difference is Xen related or not. > > So, I will agree with you that the relative cost of query and conversion > are not ordered as the documentation suggests. > However, I can't reproduce the 2384ns query overhead. > > > If you can tell me more about the platform you are running on perhaps I > could understand the extraordinarily high query cost you report. > > -Paul > > Nikita Andreev wrote: > >> Hello Paul and all, >> >> I'm measuring the overhead of bupc_ticks_now() and bupc_ticks_to_ns() >> and results doesn't look like I expected. Find the test attached. >> >> I've made 1 million iterations and have got the following: >> >> bupc_ticks_now: 2383ns >> >> bupc_ticks_to_ns: 4ns >> >> So the conversion is made lot faster than query. But documentation >> says: The bupc_ticks_to_{us,ns}() conversion calls can be >> significantly more expensive than the bupc_ticks_now() tick query. >> >> What I also noticed is that first bupc_tick_to_ns call in a loop is >> very slow. It can take even 1,5 milliseconds. And almost all others >> are very fast. >> >> What am I missing here? >> >> P.S. I've performed this test on a virtual machine if it makes any >> difference. >> >> Regards, >> >> Nikita Andreev >> >> > > > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory