Subject: Re: Bug found: help to isolate it
To: Lista de NetBSD Users <list10@sepc.edu.mx>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-users
Date: 05/19/2002 15:10:40
On Sat, May 18, 2002 at 08:43:12PM -0500, Lista de NetBSD Users wrote:
> On Thu, 16 May 2002, Manuel Bouyer wrote:
> 
> > Hum, it looks like it doens't use any CPU at this point, to it's likely not
> > the same bug.
> > Could you, when this occurs again
> > 1) do a ps -axl and post the syslog line
> > 2) keep a root shell around a kill -ABRT syslogd, to get a core dump ?
> >
> > BTW, killing syslogd should make things working again
> 
> Thanks, Manuel...
> 
> Yesterday, I could see one more machine (1.5.2/i386) in our LAN
> with the same problem... and last night we have a power failure...
> sorry... the servers rebooted fine (with fsck) and the problem
> of syslogd disappeared.
> 
> I am compiling new kernels for some machines with the DDB option
> and preparing the shell for kill syslogd. I will use it next time.

OK

> 
> This problem is not new for me and I could say it occurs
> every 30 or 60 days in our LAN (we have 13+ boxes 1.5.x/i386)
> and I have seen it since two+ years ago (versions 1.4 and
> maybe 1.3.x)

BTW, I suspect you have some very special setup. I have > 20
NetBSD servers (i386, alpha, sparc), some of them with very large
uptime (> 500 days) and the only time I've seen that was when someone
hit the 'scroll lock' on the console. Unlocking it unwedged syslogd.

Can you give more details about your setup (hardware, console type, software
using syslogd, syslogd.conf, etc ...)
To check softwares using syslogd you can use fstat: first find the
address of the unix socket:
rochebonne# fstat |grep syslog
root     syslogd      109 root /              2 drwxr-xr-x    1024 r 
root     syslogd      109   wd /              2 drwxr-xr-x    1024 r 
root     syslogd      109    0 /           4358 crw-rw-rw-    null rw
root     syslogd      109    1 /           4358 crw-rw-rw-    null rw
root     syslogd      109    2 /           4358 crw-rw-rw-    null rw
root     syslogd      109    3* unix dgram c07cb940
root     syslogd      109    4* internet6 dgram udp c0bde300 *:514
root     syslogd      109    5 /           4354 crw-------  console w 
root     syslogd      109    6 /           1150 -rw-r--r--   17041 w 
root     syslogd      109    7 /           1150 -rw-r--r--   17041 w 
root     syslogd      109    8 /           1104 -rw-------   26716 w 
root     syslogd      109    9 /            697 -rw-------    1275 w 
root     syslogd      109   10 /           1422 -rw-------       0 w 
root     syslogd      109   11 /           1277 -rw-r-----       0 w 
root     syslogd      109   12 /           1113 -rw-------    1329 w 
root     syslogd      109   13* internet dgram udp c06f2000 *:514
root     syslogd      109   14 /           4362 crw-------    klog r 

in my case it's c07cb940. Now get a list of processes using it:
rochebonne# fstat |grep c07cb940
root     perl         191    4* unix dgram c0bf2b00 <-> c07cb940
root     mountd       164    4* unix dgram c0bd9480 <-> c07cb940
root     lfs_cleanerd   134    4* unix dgram c0a42e80 <-> c07cb940
root     lfs_cleanerd   133    4* unix dgram c0a42e80 <-> c07cb940
root     syslogd      109    3* unix dgram c07cb940

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
--