bin/47431: nanosleep is more like millisleep

To: gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: bin/47431: nanosleep is more like millisleep
From: dholland%eecs.harvard.edu@localhost
Date: Thu, 10 Jan 2013 23:25:00 +0000 (UTC)

>Number:         47431
>Category:       bin
>Synopsis:       nanosleep is more like millisleep
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jan 10 23:25:00 +0000 2013
>Originator:     David A. Holland
>Release:        NetBSD 6.99.11 (20120906)
>Organization:
>Environment:
System: NetBSD macaran 6.99.11 NetBSD 6.99.11 (MACARAN) #15: Mon Oct 15 
21:24:31 EDT 2012 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:

nanosleep(2) (and its related forms) do an exceptionally poor job.

While NetBSD is not a realtime system and nothing is particularly
guaranteed, the current nanosleep behavior is near-useless and we can
and should better.

I wrote a simple test program to measure how long nanosleep actually
sleeps, and ascertained the following (on an otherwise idle system
with plenty of spare cores):

   - nanosleep(0) apparently doesn't sleep, but nearly always takes
     several milliseconds to return; this seems excessive, even if
     we assume part of that time is actually being used calling
     clock_gettime().

   - For all time values > 0 and <= 10 ms, the resulting delay is
     almost always 19.99 ms. I assume this is two scheduler quanta,
     since I notice that HZ is still 100.

   - (I thought x86 had changed to HZ=1000 some time back, but
     apparently not.)

   - For all time values >= 20 ms, the resulting delay is nearly
     always the requested time plus 9.99 ms, or sometimes a bit more
     than that. That is, even if the requested delay is an integer
     number of scheduler quanta, we always sleep for one more.

Ostensibly if one wants to sleep for small amounts of time, one is
supposed to busy-loop; this is fine. However, nanosleep is supposed to
do this for me, and do it in the kernel where ready access to
fine-grained timing is available. This is arguably the whole point of
nanosleep(*). Furthermore, in general only the kernel can know the
length of time at which sleeping should give way to spinning.

Even if for some reason nanosleep cannot be fixed to spin when needed,
the behavior where it always tacks on one extra scheduler quantum
(thus always taking two for very short sleeps) is particularly silly
and can and should be fixed.

However, being able to sleep for short periods of time is useful in a
number of contexts, and I would think we ought to make a credible
best-effort attempt to support it.

(Also, is there any reason we haven't gone to HZ=1000 for at least
x86? Other OSes did it years ago.)


(*) Or at least, it was when nanosleep was introduced, to the best of
my recollection. If this behavior has been explicitly prohibited by
standards in the meantime, please point me at C&V.

>How-To-Repeat:

Here is the test program:

   ---- nanoslap.c ----
#include <sys/types.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static void printheader(void) {
   printf("Requested       Experienced     Excess          Error\n");
}

static void printtime(char *buf, size_t max, const struct timespec *tv) {
   snprintf(buf, max, "%lld.%09lu", (long long) tv->tv_sec, tv->tv_nsec);
}

static unsigned long long getnsecs(const struct timespec *tv) {
   return (tv->tv_sec * (unsigned long long) 1000000000) + tv->tv_nsec;
}

static void testone(unsigned long long nsecs) {
   struct timespec requested, start, end, experienced, excess;
   unsigned long long a, b, c;
   char buf[32];

   if (nsecs >= 1000000000) {
      requested.tv_sec = nsecs / 1000000000;
      requested.tv_nsec = nsecs % 1000000000;
   }
   else {
      requested.tv_sec = 0;
      requested.tv_nsec = nsecs;
   }

   clock_gettime(CLOCK_MONOTONIC, &start);
   nanosleep(&requested, NULL);
   clock_gettime(CLOCK_MONOTONIC, &end);

   timespecsub(&end, &start, &experienced);
   timespecsub(&experienced, &requested, &excess);

   printtime(buf, sizeof(buf), &requested);
   printf("%-16s", buf);

   printtime(buf, sizeof(buf), &experienced);
   printf("%-16s", buf);

   printtime(buf, sizeof(buf), &excess);
   printf("%-16s", buf);

   a = getnsecs(&requested);
   b = getnsecs(&experienced);
   c = getnsecs(&excess);

   if (a == 0) {
      printf("---");
   }
   else if (b > 2*a) {
      printf("%g x", (double)b / (double)a);
   }
   else {
      printf("%g %%", (100.0*c) / (double)a);
   }
   printf("\n");
}

static void testall(void) {
   unsigned long x;
   unsigned k;

   testone(0);
   for (x = 1; x < 1000000000; x *= 10) {
      for (k = 1; k < 10; k++) {
         testone(x * k);
      }
   }
}

int main(int argc, char *argv[]) {
   int i;

   printheader();
   if (argc == 1) {
      testall();
   }
   else {
      for (i=2; i<argc; i++) {
         testone((unsigned long long) 1000000000 * atof(argv[i]));
      }
   }

   return 0;
}
   --------

Note that this system does not have clock_nanosleep() as Christos only
added it in October, but using it shouldn't make any difference.

>Fix:

dunno.

Prev by Date: Re: kern/36030 (K3b panics the machine while verifying written disc)
Next by Date: Re: lib/46367 (broken semaphore with pthread_cancel)
Previous by Thread: bin/47430: gdb can't debug threaded programs
Next by Thread: Re: bin/47431: nanosleep is more like millisleep
Indexes:

Home | Main Index | Thread Index | Old Index