Re: ntpd wedged by libc?

To: Christos Zoulas <christos%zoulas.com@localhost>
Subject: Re: ntpd wedged by libc?
From: AGC <agcarver+netbsd%acarver.net@localhost>
Date: Mon, 02 Apr 2012 12:29:25 -0700

On 3/18/2012 14:10, Christos Zoulas wrote:

On Mar 18, 12:47pm, agcarver+netbsd%acarver.net@localhost (AGC) wrote:
-- Subject: Re: ntpd wedged by libc?

| Ok, it seems that the memory leak isn't occuring anymore but there may
| be an infinite loop problem.  Using the snprintf workaround in ntpd, it
| still wedged a couple days ago using over 80% CPU.  The memory usage
| jumped up some (from 5M to 10M) but it did not go beyond that even
| though I let it run in the wedged condition for hours.
|
| The back trace is below.  The daemon was running for about 12 days
| straight with no problems and then suddenly wedged.
|
| Could this be a threads issue?  Reading the header in
| libc/gdtoa/gdtoimp.h I see that pow5mult is a multithreaded function.
| Is there a chance for cross thread clobbering or is it just a case of a
| termination condition not being met in a loop and letting things spin
| out of control?

Could we be looking at a compiler bug here?

christos

I tried something when ntpd got hung today. I used the debugger toforcibly end the stuck call:


(gdb) bt
#0  0x103b5480 in __mult_D2A () from /usr/lib/libc.so.12
#1  0x103b56e4 in __pow5mult_D2A () from /usr/lib/libc.so.12
#2  0x103a7ca0 in __dtoa () from /usr/lib/libc.so.12
#3  0x103a50c0 in __vfprintf_unlocked () from /usr/lib/libc.so.12
#4  0x103a64f8 in vfprintf () from /usr/lib/libc.so.12
#5  0x103a184c in fprintf () from /usr/lib/libc.so.12

#6 0x000424bc in record_loop_stats (offset=6.920439999997807e-05,freq=-7.7587471658325353e-05,jitter=0.00012207031250000005, wander=1.2000280120993315e-09,spoll=4) at ntp_util.c:584#7 0x00031760 in local_clock (peer=0xb29a0,fp_offset=6.920439999997807e-05) at ntp_loopfilter.c:666

#8  0x00036524 in clock_select () at ntp_proto.c:1851

#9 0x00037024 in clock_filter (peer=0xb29a0,sample_offset=6.920439999997807e-05,sample_delay=<value optimized out>, sample_disp=0.0001220703125) atntp_proto.c:2360

#10 0x0003babc in refclock_receive (peer=0xb29a0) at ntp_refclock.c:556
#11 0x0003be50 in refclock_transmit (peer=0xb29a0) at ntp_refclock.c:335
#12 0x00041724 in timer () at ntp_timer.c:320
#13 0x000233cc in ntpdmain (argc=0, argv=0xefffec10) at ntpd.c:1026
#14 0x0001382c in ___start ()
#15 0x00013764 in _start ()

(gdb) return
Make selected stack frame return now? (y or n) y
#0  0x103a7ca0 in __dtoa () from /usr/lib/libc.so.12

This seems to have gotten it unstuck because ntpd started runningnormally again as soon as I exited gdb.

It was in the middle of writing to a file when the code bombed. Thearea around the log file was (not that I think it's terribly useful):


56019 40827.814  0.000070325 -77.589  0.000122070  0.001219 4
56019 40843.823  0.\x00\x00\x00 -77.587  0.000122070  0.001200 4
56019 50415.027  0.000000000 -77.587  0.000122070  0.001123 6

So it appears there's an infinite loop occurring in __mult_D2A (orpossibly above it in _pow5mult_D2A)

Follow-Ups:
- Re: ntpd wedged by libc?
  - From: Christos Zoulas

References:
- Re: ntpd wedged by libc?
  - From: Christos Zoulas

Prev by Date: Revalidate Your MailBox
Next by Date: Re: ntpd wedged by libc?
Previous by Thread: Re: ntpd wedged by libc?
Next by Thread: Re: ntpd wedged by libc?
Indexes:

Home | Main Index | Thread Index | Old Index