Subject: bin/34200: timed occasionally goes into infinate loop
To: None <gnats-admin@netbsd.org, netbsd-bugs@netbsd.org>
From: Tim Rightnour <root@polaris.garbled.net>
List: netbsd-bugs
Date: 08/14/2006 19:05:13
>Number:         34200
>Category:       bin
>Synopsis:       timed occasionally goes into infinate loop
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Aug 14 19:05:00 +0000 2006
>Originator:     Tim Rightnour
>Release:        NetBSD 3.0
>Organization:
	
>Environment:
	
	
System: NetBSD polaris.garbled.net 3.0 NetBSD 3.0 (GENERIC) #0: Mon Dec 19 01:04:02 UTC 2005 builds@works.netbsd.org:/home/builds/ab/netbsd-3-0-RELEASE/i386/200512182024Z-obj/home/builds/ab/netbsd-3-0-RELEASE/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
Every few weeks if find that my CPU is pegged on my timed master server. 
Investigation usually shows that timed is eating all the CPU on the box.  I
attempted to debug it a little, and this is what I turned up:

0x0804b4e7 in median ()
(gdb) where
#0  0x0804b4e7 in median ()
#1  0x0804b3b8 in networkdelta ()
#2  0x0804a7f8 in synch ()
#3  0x0804a358 in master ()
#4  0x0804dc86 in main ()
#5  0x080490f6 in ___start ()

It appears to be stuck somewhere in that function looping endlessly.  If I
attach ktrace to it, it never produces any output.  I suspect it's gotten
ahold of some odd values and is trying endlessly to average them.

	
>How-To-Repeat:
No idea.  I've been running timed for years on this system and never seen this.
Maybe one of my more recently added client machines is triggering it, or it
has to do with the number of boxes on the network?
	
>Fix:
Not sure, however, I suspect that the following in networkdelta.c:median() is
possibly where we are going wrong:

        for (pass = 1; ; pass++) {      /* loop over the data */

I suppose if you gave it certain values it might never exit that loop.  Unsure
what those values might be.  Maybe we need to put a maximum cap on passes?

	

>Unformatted: