RE: Expense of BUPC timer functions

From: Nikita Andreev (nik_at_kemsu.ru)
Date: Tue Mar 23 2010 - 02:55:59 PDT

  • Next message: Yaakoub El Khamra: "Question regarding blocksize"
    Paul,
    
    I've ran this test several times and here what I've got:
    Get 1 tick: 2797ns. 1 convert: 2ns
    Get 1 tick: 2383ns. 1 convert: 4ns
    Get 1 tick: 82ns. 1 convert: 2ns
    Get 1 tick: 2137ns. 1 convert: 2ns
    Get 1 tick: 2861ns. 1 convert: 2ns
    Get 1 tick: 2773ns. 1 convert: 2ns
    Get 1 tick: 90ns. 1 convert: 3ns
    Get 1 tick: 2595ns. 1 convert: 3ns
    Get 1 tick: 2881ns. 1 convert: 2ns
    Get 1 tick: 2849ns. 1 convert: 3ns
    
    I'm running CentOS 5 on VMware Guest OS which is being run under VMware ESX
    Server 3 with Intel Xeon E5345 @ 2.33GHz CPUs on board.
    
    Physical CentOS 5 host with Intel Xeon E5410 @ 2.33GHz CPUs produces
    following results:
    Get 1 tick: 38ns. 1 convert: 2ns
    Get 1 tick: 38ns. 1 convert: 2ns
    Get 1 tick: 38ns. 1 convert: 2ns
    Get 1 tick: 38ns. 1 convert: 2ns
    Get 1 tick: 38ns. 1 convert: 2ns
    Get 1 tick: 38ns. 1 convert: 2ns
    Get 1 tick: 38ns. 1 convert: 2ns
    Get 1 tick: 39ns. 1 convert: 2ns
    Get 1 tick: 38ns. 1 convert: 2ns
    Get 1 tick: 38ns. 1 convert: 2ns
    
    
    Anyway, the reason why I'm asking is the thing you said in one of the former
    letters (from 27th of February):
    "... If you are concerned about the costs (time for conversion or space for
    storage) then it may be possible that one could record "raw" ticks in a
    trace file plus enough additional information to allow ticks-to-ns
    conversion to be performed outside of the UPC code (e.g. by a tool that
    processes the trace file).  ..."
    
    Are we agreed upon the fact that tick to sec conversion produces nearly no
    overhead and there is no need to write timestamps in ticks? Let's say we are
    using doubles to store converted ticks to sec value. Double has 6 digits
    after the point. It means that digits less than 1 microsecond are discarded.
    Hence the only target of perturbation are VMs. And moreover the cause of
    this perturbation is not the conversion.
    
    The next point is about the space for storage. bupc_ticks_t is basically a
    uin64_t. double is also a 64 bit type.
    
    So it seems that tick to sec conversion in first place is simple and
    reliable, isn't it?
    
    Nikita
    
    
    -----Original Message-----
    From: owner-upc-users_at_lbl_dot_gov [mailto:owner-upc-users_at_lbl_dot_gov] On Behalf Of
    Paul H. Hargrove
    Sent: Tuesday, March 23, 2010 2:48 PM
    To: Nikita Andreev
    Cc: upc-users_at_lbl_dot_gov
    Subject: Re: Expense of BUPC timer functions
    
    Hello, Nikita.
    
    The fact that you are using a virtual machine is probably not perturbing 
    the cost of bupc_ticks_now(), at least not to the extent you report 
    seeing. Assuming a modern AMD or Intel CPU, this function is using the 
    RDTCS instruction on most OSes, which should be quite cheap.
    
    The observation that the first call to bupc_ticks_to_ns() is more 
    expensive than later calls is to be expected. The first instance parses 
    /proc/cpuinfo to get the clock rate and stores the value for reuse in 
    subsequent calls.
    
    Running on a 2.3GHz Intel Xeon E5410 from a Xen Dom0 kernel, the output 
    from your attached program is
    Get 1 tick: 33ns. 1 convert: 2ns
    I tried on a Xen HVM DomU running on the same machine:
    Get 1 tick: 34ns. 1 convert: 2ns
    And on a Xen PV DomU on the same machine:
    Get 1 tick: 33ns. 1 convert: 2ns
    So virtualization is probably not a significant factor, at least under Xen
    
    On an older 2.2GHz Opteron which is not running Xen I see
    Get 1 tick: 6ns. 1 convert: 2ns
    And an old 2.8Ghz Pentium-4 yields
    Get 1 tick: 82ns. 1 convert: 4ns
    So, there can be significant variation among platforms, and I don't know 
    if the 33-vs-6 difference is Xen related or not.
    
    So, I will agree with you that the relative cost of query and conversion 
    are not ordered as the documentation suggests.
    However, I can't reproduce the 2384ns query overhead.
    
    
    If you can tell me more about the platform you are running on perhaps I 
    could understand the extraordinarily high query cost you report.
    
    -Paul
    
    Nikita Andreev wrote:
    >
    > Hello Paul and all,
    >
    > I'm measuring the overhead of bupc_ticks_now() and bupc_ticks_to_ns() 
    > and results doesn't look like I expected. Find the test attached.
    >
    > I've made 1 million iterations and have got the following:
    >
    > bupc_ticks_now: 2383ns
    >
    > bupc_ticks_to_ns: 4ns
    >
    > So the conversion is made lot faster than query. But documentation 
    > says: The bupc_ticks_to_{us,ns}() conversion calls can be 
    > significantly more expensive than the bupc_ticks_now() tick query.
    >
    > What I also noticed is that first bupc_tick_to_ns call in a loop is 
    > very slow. It can take even 1,5 milliseconds. And almost all others 
    > are very fast.
    >
    > What am I missing here?
    >
    > P.S. I've performed this test on a virtual machine if it makes any 
    > difference.
    >
    > Regards,
    >
    > Nikita Andreev
    >
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Yaakoub El Khamra: "Question regarding blocksize"