Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ntpd wedged by libc?



On 3/7/2012 13:57, Dave Hart wrote:
On Wed, Mar 7, 2012 at 20:58, AGC<agcarver+netbsd%acarver.net@localhost>  wrote:
On 3/7/2012 09:14, Dave Hart wrote:
AGC, approximately how long does it take for ntpd to get wedged with
the latest libc?  And how many peers are there in your ntpq -p output?

The wedge time has varied.  Sometimes it happened within a few hours of
starting ntpd and other times it could take several days.  I could never
really predict when it would fail.

You can probably speed it by pounding harder with ntpq.  Use ntpq's -n
option to take out any DNS-related delays, and sleep less between
queries.

There are seven peers total listed in the billboard.

So about 50 ntpq-related %.3f snprintf() calls on the order of every 5 seconds.

What about the log and statistics files?  I believe they're also using
various printf() calls, too, yes?

Assuming you're not running ntpd interactively with -D or -d options
for debug trace output, clockstats, peerstats and loopstats would be
the next most frequent users of snprintf with floating point, but
that's a handful of floating-point-to-text conversions at the rate of
once per poll, or as often as 8 seconds for refclocks.  All the traces
you've provided have come through ntp_control.c indicating ntpq-style
NTP mode 6 queries triggered the failure.

Yes, I was specifically thinking the general log file (no debugging but general messages like the fuzz errors, popcorn, etc.) and the peers, clockstats, loops and sysstats files (I'm recording all of those).


Was ntpd using any CPU when it wedged?

Yes, when it wedges it uses upwards of 80-90% CPU according to top but is otherwise unresponsive unless issued a SIGKILL, SIGTERM, or SIGHUP. I attach gdb to it while it's spinning the CPU that's where I get the stack trace. Interestingly a simple SIGTERM will break the loop and ntpd will actually close out normally -- it issues the exit messages in the logs about releasing kernel discipline and then message "exit on signal 15" (or whatever the number is for a kill command with no signal flag, I don't recall).

Home | Main Index | Thread Index | Old Index