Subject: gettimeofday()/BIND/NetBSD/m68k BUG?
To: None <port-mac68k@netbsd.org>
From: Joe Laffey <joe@laffeycomputer.com>
List: port-mac68k
Date: 02/12/2001 14:02:56
I am not sure if this is a bug in gettimeofday(), but I believe that is
is. If it is not then there is a bug in BIND 8.2.3 (the latest version 8
patch). I discovered this because my BIND died unexpectedly a couple of
times on my old Mac 68K NetBSD box. In my tests this only affects NetBSD
Mac 68k, but it is worth exploring other systems/OSes.

gettimeofday is supposed to return a timeval struct:

struct timeval {
             long    tv_sec;         /* seconds since Jan. 1, 1970 */
             long    tv_usec;        /* and microseconds */
     };

Under NetBSD 1.3.2 and 1.4.2 mac68k (and possibly others) the tv_usec
field, which is supposed to contain the number of microseconds, will
sometimes be equal to 1000000. Every other implementation I have found
would increment the tv_sec field and set tv_usec to 0 (since 1000000
microseconds == 1 second).

Though this is not the only file with a problem, here is an example: BIND
8.2.3 on line 114 of lib/isc/ev_timers.c calls the following code:

struct timespec
evNowTime() {
         struct timeval now;

         if (gettimeofday(&now, NULL) < 0)
                return (evConsTime(0, 0));
        INSIST(now.tv_usec >= 0 && now.tv_usec < 1000000);
         return (evTimeSpec(now));
}

The INSIST function will die with an error message if its value is false.
This code fails when tv_usec == 1000000. This DOES happen on the
OS/platform I mentioned above. You can test your own setup by running the
following ugly-hacked-out test program for a while (on my NetBSD systems
it fails within a few seconds, and these are old 68k boxen!)

Test program:

#include <sys/time.h>
#include <stdio.h>
int main()
{
 struct timeval now;
 while(1)
 {
  if (gettimeofday(&now, NULL) < 0)
	{
		printf("bad retval\n");
		return 1;
	}
         if(!( now.tv_usec < 1000000))
	{
		printf("Whoa! Bad mojo! tv_usec=%ld\n",now.tv_usec);
		return 1;
	}
 }
}



Solutions:

Replace the assert() (other files use assert()) and INSIST() calls with
something like this:

	if(!( now.tv_usec < 1000000 && now.tv_usec >= 0))
        {
              if(now.tv_usec == 1000000)
              {
                     now.tv_sec++;
                     now.tv_usec=0;
              }
              else
                      INSIST(now.tv_usec >= 0 && now.tv_usec <= 1000000);
        }

The following files in the BIND 8.2.3 release seem to suffer from this
problem (perhaps others, but these are what I found):

bin/named/ns_glue.c
bin/dig/dig.c
lib/dst/rsaref_link.c
lib/isc/ev_timers.c


Whether this is a BIND bug or a gettimeofday() bug is not totally clear (my
guess is a gettimeofday() bug). However, enough people are affected by
this that the BIND source should code defensively around it. Perhaps the
additional check should be placed in #ifdefs to only apply to affected
systems.

Anyone who gets the "Whoa! Bad mojo!" message from the test above needs to
do something if they wish BIND to run stable on their system.

I tested this on the following systems that I have access to. On all of
these the problem did not manifest itself:
BSDI/ BSD OS 3.0 (intel)
Redhat 6.x (intel) glibc 2.1.3xx
LinuxPPC (g3 Mac) glibc 2x



I would really be interested in hearing if this bug is in all ports of
NetBSD, or just the Mac68k port. ( I could not find the right source file
on cvsweb.netbsd.org... Am I blind?)

NO WARRANTY on any of this info, etc., etc.

Have fun,

Joe Laffey
LAFFEY Computer Imaging
St. Louis, MO
-------------------------