Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ntpd wedged by libc?

On 3/18/2012 14:10, Christos Zoulas wrote:
On Mar 18, 12:47pm, agcarver+netbsd%acarver.net@localhost (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| Ok, it seems that the memory leak isn't occuring anymore but there may
| be an infinite loop problem.  Using the snprintf workaround in ntpd, it
| still wedged a couple days ago using over 80% CPU.  The memory usage
| jumped up some (from 5M to 10M) but it did not go beyond that even
| though I let it run in the wedged condition for hours.
| The back trace is below.  The daemon was running for about 12 days
| straight with no problems and then suddenly wedged.
| Could this be a threads issue?  Reading the header in
| libc/gdtoa/gdtoimp.h I see that pow5mult is a multithreaded function.
| Is there a chance for cross thread clobbering or is it just a case of a
| termination condition not being met in a loop and letting things spin
| out of control?

Could we be looking at a compiler bug here?


I tried something when ntpd got hung today. I used the debugger to forcibly end the stuck call:

(gdb) bt
#0  0x103b5480 in __mult_D2A () from /usr/lib/libc.so.12
#1  0x103b56e4 in __pow5mult_D2A () from /usr/lib/libc.so.12
#2  0x103a7ca0 in __dtoa () from /usr/lib/libc.so.12
#3  0x103a50c0 in __vfprintf_unlocked () from /usr/lib/libc.so.12
#4  0x103a64f8 in vfprintf () from /usr/lib/libc.so.12
#5  0x103a184c in fprintf () from /usr/lib/libc.so.12
#6 0x000424bc in record_loop_stats (offset=6.920439999997807e-05, freq=-7.7587471658325353e-05, jitter=0.00012207031250000005, wander=1.2000280120993315e-09, spoll=4) at ntp_util.c:584 #7 0x00031760 in local_clock (peer=0xb29a0, fp_offset=6.920439999997807e-05) at ntp_loopfilter.c:666
#8  0x00036524 in clock_select () at ntp_proto.c:1851
#9 0x00037024 in clock_filter (peer=0xb29a0, sample_offset=6.920439999997807e-05, sample_delay=<value optimized out>, sample_disp=0.0001220703125) at ntp_proto.c:2360
#10 0x0003babc in refclock_receive (peer=0xb29a0) at ntp_refclock.c:556
#11 0x0003be50 in refclock_transmit (peer=0xb29a0) at ntp_refclock.c:335
#12 0x00041724 in timer () at ntp_timer.c:320
#13 0x000233cc in ntpdmain (argc=0, argv=0xefffec10) at ntpd.c:1026
#14 0x0001382c in ___start ()
#15 0x00013764 in _start ()

(gdb) return
Make selected stack frame return now? (y or n) y
#0  0x103a7ca0 in __dtoa () from /usr/lib/libc.so.12

This seems to have gotten it unstuck because ntpd started running normally again as soon as I exited gdb.

It was in the middle of writing to a file when the code bombed. The area around the log file was (not that I think it's terribly useful):

56019 40827.814  0.000070325 -77.589  0.000122070  0.001219 4
56019 40843.823  0.\x00\x00\x00 -77.587  0.000122070  0.001200 4
56019 50415.027  0.000000000 -77.587  0.000122070  0.001123 6

So it appears there's an infinite loop occurring in __mult_D2A (or possibly above it in _pow5mult_D2A)

Home | Main Index | Thread Index | Old Index