On Fri, Mar 9, 2012 at 17:02, David Brownlee<abs%netbsd.org@localhost> wrote:
On 7 March 2012 22:29, AGC<agcarver+netbsd%acarver.net@localhost> wrote:
On 3/7/2012 13:57, Dave Hart wrote:
Yes, when it wedges it uses upwards of 80-90% CPU according to top but is
otherwise unresponsive unless issued a SIGKILL, SIGTERM, or SIGHUP. I attach
gdb to it while it's spinning the CPU that's where I get the stack trace.
Interestingly a simple SIGTERM will break the loop and ntpd will actually
close out normally -- it issues the exit messages in the logs about
releasing kernel discipline and then message "exit on signal 15" (or
whatever the number is for a kill command with no signal flag, I don't
recall).
Apologies if I've misunderstood something here, but if using the hand
rolled snprintf does not use dtoa() and avoids the issue would it make
sense to try one of:
a) Modify the hand rolled snprintf to use dtoa() to confirm it is the
dtoa() calls, plus have it keep a static fd open and just write out
the argument before calling each dtoa(). If it hangs, you have a
history of dtoa() calls which you can replay in a test app to see
which bit pattern or sequence of bit patterns causes the issue.
Attached and inlined below is a patch to ntpd's replacement snprintf()
to log floating point values as hex dumps to "printf_dtoa.log" then
call dtoa(), though it doesn't actually use dtoa's text conversion
result. AGC, if you apply this patch and rebuild ntpd (configured
with --enable-C99-snprintf) the log could be very helpful assuming it
eventually spins infinitely inside dtoa().
The code compiles but I don't have dtoa() on this system to test with,
so I haven't tested it.
Cheers,
Dave Hart