Re: Defining block size during runtime

Date view	Thread view	Subject view	Author view	Attachment view
From: sainath l (ls.sainath_at_gmail_dot_com)
Date: Sat Jul 25 2009 - 04:50:29 PDT
Next message: Gary Funck: "Re: Defining block size during runtime"
Previous message: Gary Funck: "Re: Defining block size during runtime"
In reply to: Gary Funck: "Re: Defining block size during runtime"
Next in thread: Gary Funck: "Re: Defining block size during runtime"
Reply: Gary Funck: "Re: Defining block size during runtime"
Hi guys,



@ Paul
I am using BUPC 2.8.0.


@ Gary


An alternative, might be to increase the number of
iterations until the total time taken exceeds some threshhold,
say 10 seconds.  Then for any reasonable implementation of
upc_barrier you can assume that its impact on the total time
is not signifcant.  Something like this:
#define MIN_TEST_TIME 10.0
     while (flag)
       {
         upc_barrier;
         start = get_time ();
         for (i = 0; i < iter; i++)
           {
             upc_barrier;
             upc_all_broadcast (&B[0].y[0], A, mess_size * sizeof (int),
                                UPC_IN_NOSYNC | UPC_OUT_NOSYNC);
             upc_barrier;
           }
         T = get_time () - start;
         upc_barrier;
         if (MYTHREAD == 0)
           {
             /* [...] */
             if (T < MIN_TEST_TIME)
               {
                 iter = iter * 2;
               }

So after the  while loop  if I add

start = get_time();
for(i = 0; i < iter; i++)
{
        upc_barrier;
        upc_barrier;
}
temp = get_time() - start;

I should get an more accurate answer right  as the time taken by barrier
would not be greater than T.


This is my
get_time() in gettime.h
-----------------------------------

double get_time()
{
        static int Fcall = 1;
        static int Init_time;
        int err;
        double Time;
        struct timeval Tp;
        if(Fcall == 1)
        {
                err = gettimeofday(&Tp,NULL);
                Init_time = (double)Tp.tv_sec;
                Fcall = 0;
        }
        err = gettimeofday(&Tp,NULL);
        Time = (double)(Tp.tv_sec) - Init_time + (double) Tp.tv_usec *
1.0e-6;
        return Time;
}


Thank you very much for the suggestions and help.

Cheers,
Sainath


On Sat, Jul 25, 2009 at 8:46 AM, Gary Funck <gary_at_intrepid_dot_com> wrote:

>
> On 07/25/09 05:37:20, sainath l wrote:
> >    Hi,
> >
> >    Thank you very much for answering my questions Paul. And extremely
> sorry
> >    for not providing the "gettime.h" file. Will make sure that I provide
> all
> >    the related files from next time.
>
>
> I used this simple implementation:
>
> #include <time.h>
>
> double
> get_time()
> {
>   clock_t t = clock();
>   return (double) t / (double) CLOCKS_PER_SEC;
> }
>
> I'm uncertain as to whether clock() will return the sum of
> the processor time of all currently running processes in
> UPC program, or just the time of the calling process.  I think
> only the calling process.  Things may become more problematic
> if pthrads are in play.
>
> What I've done in the past for this sort of thing is to declare
> a shared array:
>
> shared strict double cpu_times[THREADS];
>
> and then have each thread write the current iteration's
> per-thread time into cpu_times[MYTHREAD].  Thread 0 must
> then sum up all the cpu_times[] in order to arrive at the
> cpu time for the entire UPC program.  As noted, another approach
> would likely have to be taken if pthread-ed UPC threads are
> used. In mixed process/pthreads, distributed, setting things
> become even more interesting.
>
> >
> >    The code is running fine in an smp X4600 SMP node with 16 procs.
> >    But it is not running in XT 4.
> >    when I run it in XT 4 the code breaks during the first iteration. the
> >    first iteration does not complete. the printf after the upc_free(B)
> >    command does not execute.
>
> Some things that I noticed in the program:
>
> This section of code is apparently trying to find a value
> of 'iter' for which the execution time of upc_all_broadcast()
> will exceed the overhead of two back-to-back barrier calls
> and the for loop overhead.
>
>
>      while (flag)
>        {
>          upc_barrier;
>          start = get_time ();
>          for (i = 0; i < iter; i++)
>            {
>              upc_barrier;
>              upc_all_broadcast (&B[0].y[0], A, mess_size * sizeof (int),
>                                 UPC_IN_NOSYNC | UPC_OUT_NOSYNC);
>              upc_barrier;
>            }
>          T = get_time () - start;
>          upc_barrier;
>
>          start = get_time ();
>          for (i = 0; i < iter; i++)
>            {
>              upc_barrier;
>              upc_barrier;
>            }
>          temp = get_time () - start;
>          upc_barrier;
>
>          if (MYTHREAD == 0)
>            {
>              for (i = 0; i < THREADS; i++)
>                {
>                  for (j = 0; j < mess_size; j++)
>                    {
>                      printf ("%d ", B[i].y[j]);
>                    }
>                  printf ("\n");
>                }
>              printf ("\n%lf %d %d \n", (T - temp), iter, mess_size);
>
>              if ((T - temp) < 0.1)
>                {
>                  iter = iter * 2;
>                }
>
>              [...]
>
> 1. Note that thread 0 is basing its idea of execution time upon
> its call to gettime().  As pointed out earlier, what is probably intended
> here is that thread 0 would work with the total cputime across all threads.
> This might not be necessary if the only goal is to tune 'iter', but is
> most likely necessary if the idea is find the cpu time across the entire
> program used by the upc_all_broadcast() call at various message sizes.
>
> 2. The value of time T above is the time taken to execute a number
> of upc_all_broadcast() calls determined by 'iter', along with
> two upc_barrier's for each iteration.  The value 'temp' is the time
> taken to execute 2*iter upc_barrier's (plus some loop overhead, which
> is likely not significant in comparison.  The value of 'iter' will
> be continously doubled as long as T never exceeds temp by more than 0.1.
> The motivation for the test is clear: to increase iter until the
> loop overhead exceeds the cost of the upc_all_broadcast() call by
> at least 0.1.  The problem in the logic however, is that if the
> cost of upc_all_broadcast() (at low message sizes, in particular)
> is always less than the cost of two barrier calls, this loop will
> keep incrementing 'iter' ad infinitum.  That's what happens when
> I try to run this code, compiled with GCC/UPC on an SMP-based
> system.  An alternative, might be to increase the number of
> iterations until the total time taken exceeds some threshhold,
> say 10 seconds.  Then for any reasonable implementation of
> upc_barrier you can assume that its impact on the total time
> is not signifcant.  Something like this:
>
> #define MIN_TEST_TIME 10.0
>
>      while (flag)
>        {
>          upc_barrier;
>          start = get_time ();
>          for (i = 0; i < iter; i++)
>            {
>              upc_barrier;
>              upc_all_broadcast (&B[0].y[0], A, mess_size * sizeof (int),
>                                 UPC_IN_NOSYNC | UPC_OUT_NOSYNC);
>              upc_barrier;
>            }
>          T = get_time () - start;
>          upc_barrier;
>
>          if (MYTHREAD == 0)
>            {
>              /* [...] */
>
>              if (T < MIN_TEST_TIME)
>                {
>                  iter = iter * 2;
>                }
>
> 3. This code worries me a bit:
>
>          for (i = 0; i < iter; i++)
>            {
>              upc_barrier;
>              upc_all_broadcast (&B[0].y[0], A, mess_size * sizeof (int),
>                                 UPC_IN_NOSYNC | UPC_OUT_NOSYNC);
>              upc_barrier;
>            }
>
> - The upc_all_broadcast() call above is being executed concurrently
> by all threads.  That is, they are all attempting to distibuta A
> across B at the same time.  This is not a realistic use of broadcast.
>
> The following implementation ensures that only one thread executes
> a broadcast at a given time:
>
>          int i, t;
>          for (i = 0; i < iter; i++)
>            {
>              for (t = 0; t < THREADS; ++t)
>                {
>                  upc_barrier;
>                  if (t == MYTHREAD)
>                    {
>                      upc_all_broadcast (&B[0].y[0], A, mess_size * sizeof
> (int),
>                                         UPC_IN_NOSYNC | UPC_OUT_NOSYNC);
>                    }
>                  upc_barrier;
>                }
>            }
>
> You might need to normalize your results by dividing by the number of
> threads at the end of each test run, if you're interested in
> upc_all_broadcast() times as a function of message size only.
>
> - The test declars A as a vector dynamically allocated on thread 0.
> Thus, the broadcast above, is always copying from thread 0's shared space
> into all the other's shared space.  More typically, A would have
> affinity to the calling thread.  If you declare A as being local
> to a thread (dropping the "* shared" in the current implementation);
>
> shared[] int *A;
>
> and then make this call in each thread, rather than just thread 0:
>
>      if (MYTHREAD == 0)
>        {
>          flag = 1;
>
>          B = upc_global_alloc (THREADS, mess_size * sizeof (int));
>
>        }
>      /* All threads allocate their own 'A' */
>      A = (shared [] int *) upc_alloc (mess_size * sizeof (int));
>      for (i = 0; i < mess_size; i++)
>        {
>          A[i] = i + 1;
>        }
>      upc_barrier;
>
> this will be a more typical use of broadcast.
>
> - This can be simplified:
>              upc_all_broadcast (&B[0].y[0], A, mess_size * sizeof (int),
>                                 UPC_IN_NOSYNC | UPC_OUT_NOSYNC);
>
> to:
>              upc_all_broadcast (B, A, mess_size * sizeof (int),
>                                 UPC_IN_NOSYNC | UPC_OUT_NOSYNC);
>
>
> Hopefully, incorporation of some/all of the suggestions above will lead
> to a more robust test.
>
> - Gary
>
Next message: Gary Funck: "Re: Defining block size during runtime"
Previous message: Gary Funck: "Re: Defining block size during runtime"
In reply to: Gary Funck: "Re: Defining block size during runtime"
Next in thread: Gary Funck: "Re: Defining block size during runtime"
Reply: Gary Funck: "Re: Defining block size during runtime"
Date view	Thread view	Subject view	Author view	Attachment view