Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [ntp:questions] Ntpd in uninterruptible sleep?



Hi everyone,

Below is the most recent conversation I've been having over on the ntpd list trying to resolve an issue with ntpd locking up on my IPX. After much back and forth with various debugging data, Dave Hart appears to have identified the bug as being inside NetBSD's copy of libc and it's dtoa(). So the thread has bee brought over here to continue the discussion and try to resolve the issue.

In the text below is a gdb stack trace of ntpd at the point when it is stuck and running at near 100% CPU. This event occurs after ntpd has been running for some period of time but two individual stack traces (another one created after the quoted one below) show exactly the same stack, ultimately dying in libc.



On 11/11/2011 19:19, Dave Hart wrote:
On Fri, Nov 11, 2011 at 20:23, A C<agcarver+ntp%acarver.net@localhost>  wrote:
First attempt with gdb and a back trace after attaching gdb to the hung
process (note this particular running of ntpd was not using the debug
command line options):

#0  0x103d1458 in .umul () from /usr/lib/libc.so.12
#1  0x103c38d4 in __pow5mult_D2A () from /usr/lib/libc.so.12
#2  0x103c3ac4 in __muldi3 () from /usr/lib/libc.so.12
#3  0x103c34dc in __mult_D2A () from /usr/lib/libc.so.12
#4  0x103c3728 in __pow5mult_D2A () from /usr/lib/libc.so.12
#5  0x103b61d4 in __dtoa () from /usr/lib/libc.so.12
#6  0x103b315c in __vfprintf_unlocked () from /usr/lib/libc.so.12
#7  0x103230c4 in snprintf () from /usr/lib/libc.so.12
#8  0x00023afc in ctl_putarray (tag=<value optimized out>, arr=0xa8fe0,
start=1)
    at ntp_control.c:1307
#9  0x00024a7c in ctl_putpeer (varid=30, peer=0xa8e70) at
ntp_control.c:1777
#10 0x0002744c in read_variables (rbufp=0x1050d000, restrict_mask=0) at
ntp_control.c:2334
#11 0x0002664c in process_control (rbufp=0x1050d000, restrict_mask=0) at
ntp_control.c:809
#12 0x00035594 in receive (rbufp=0x1050d000) at ntp_proto.c:370
#13 0x00022c00 in ntpdmain (argc=<value optimized out>, argv=<value
optimized out>) at ntpd.c:1150
#14 0x0001381c in ___start ()
#15 0x00013754 in _start ()

Excellent.  I assume the stack trace is from ntpd 4.2.6p3.  I think
you've found a bug in your system's libc dtoa() exposed by its
snprintf(s, " %.2f", ...).  I believe you will not be able to
reproduce the bug using 4.2.7, as that version of ntpd uses
C99-snprintf [1] if the system snprintf() is not C99-compliant.
C99-snprintf's rpl_vsnprintf() does not use dtoa(), it hand-rolls the
double-to-ascii conversion.  Below is the code in ntpd.  NTP_SHIFT is
8.  I claim the ntpd code is correct and your system's dtoa() and
thereby snprintf() of double (floating point) is subject to infinite
looping for some values.

I suggest we move this discussion to the appropriate NetBSD mailing
list.  Please cc me, and I'll subscribe.

/*
  * ctl_putarray - write a tagged eight element double array into the response
  */
static void
ctl_putarray(
        const char *tag,
        double *arr,
        int start
        )
{
        register char *cp;
        register const char *cq;
        char buffer[200];
        int i;
        cp = buffer;
        cq = tag;
        while (*cq != '\0')
                *cp++ = *cq++;
        i = start;
        do {
                if (i == 0)
                        i = NTP_SHIFT;
                i--;
                NTP_INSIST((cp - buffer)<  sizeof(buffer));
                snprintf(cp, sizeof(buffer) - (cp - buffer),
                         " %.2f", arr[i] * 1e3);
                cp += strlen(cp);
        } while(i != start);
        ctl_putdata(buffer, (unsigned)(cp - buffer), 0);
}

[1] http://www.jhweiss.de/software/snprintf.html

Cheers,
Dave Hart




Home | Main Index | Thread Index | Old Index