[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: ntpd wedged by libc?
On 3/18/2012 14:10, Christos Zoulas wrote:
On Mar 18, 12:47pm, agcarver+netbsd%acarver.net@localhost (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?
| Ok, it seems that the memory leak isn't occuring anymore but there may
| be an infinite loop problem. Using the snprintf workaround in ntpd, it
| still wedged a couple days ago using over 80% CPU. The memory usage
| jumped up some (from 5M to 10M) but it did not go beyond that even
| though I let it run in the wedged condition for hours.
| The back trace is below. The daemon was running for about 12 days
| straight with no problems and then suddenly wedged.
| Could this be a threads issue? Reading the header in
| libc/gdtoa/gdtoimp.h I see that pow5mult is a multithreaded function.
| Is there a chance for cross thread clobbering or is it just a case of a
| termination condition not being met in a loop and letting things spin
| out of control?
Could we be looking at a compiler bug here?
I tried something when ntpd got hung today. I used the debugger to
forcibly end the stuck call:
#0 0x103b5480 in __mult_D2A () from /usr/lib/libc.so.12
#1 0x103b56e4 in __pow5mult_D2A () from /usr/lib/libc.so.12
#2 0x103a7ca0 in __dtoa () from /usr/lib/libc.so.12
#3 0x103a50c0 in __vfprintf_unlocked () from /usr/lib/libc.so.12
#4 0x103a64f8 in vfprintf () from /usr/lib/libc.so.12
#5 0x103a184c in fprintf () from /usr/lib/libc.so.12
#6 0x000424bc in record_loop_stats (offset=6.920439999997807e-05,
spoll=4) at ntp_util.c:584
#7 0x00031760 in local_clock (peer=0xb29a0,
fp_offset=6.920439999997807e-05) at ntp_loopfilter.c:666
#8 0x00036524 in clock_select () at ntp_proto.c:1851
#9 0x00037024 in clock_filter (peer=0xb29a0,
sample_delay=<value optimized out>, sample_disp=0.0001220703125) at
#10 0x0003babc in refclock_receive (peer=0xb29a0) at ntp_refclock.c:556
#11 0x0003be50 in refclock_transmit (peer=0xb29a0) at ntp_refclock.c:335
#12 0x00041724 in timer () at ntp_timer.c:320
#13 0x000233cc in ntpdmain (argc=0, argv=0xefffec10) at ntpd.c:1026
#14 0x0001382c in ___start ()
#15 0x00013764 in _start ()
Make selected stack frame return now? (y or n) y
#0 0x103a7ca0 in __dtoa () from /usr/lib/libc.so.12
This seems to have gotten it unstuck because ntpd started running
normally again as soon as I exited gdb.
It was in the middle of writing to a file when the code bombed. The
area around the log file was (not that I think it's terribly useful):
56019 40827.814 0.000070325 -77.589 0.000122070 0.001219 4
56019 40843.823 0.\x00\x00\x00 -77.587 0.000122070 0.001200 4
56019 50415.027 0.000000000 -77.587 0.000122070 0.001123 6
So it appears there's an infinite loop occurring in __mult_D2A (or
possibly above it in _pow5mult_D2A)
Main Index |
Thread Index |