Subject: Re: Bug found: help to isolate it
To: Lista de NetBSD Users <list10@sepc.edu.mx>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-users
Date: 05/19/2002 15:10:40
On Sat, May 18, 2002 at 08:43:12PM -0500, Lista de NetBSD Users wrote:
> On Thu, 16 May 2002, Manuel Bouyer wrote:
>
> > Hum, it looks like it doens't use any CPU at this point, to it's likely not
> > the same bug.
> > Could you, when this occurs again
> > 1) do a ps -axl and post the syslog line
> > 2) keep a root shell around a kill -ABRT syslogd, to get a core dump ?
> >
> > BTW, killing syslogd should make things working again
>
> Thanks, Manuel...
>
> Yesterday, I could see one more machine (1.5.2/i386) in our LAN
> with the same problem... and last night we have a power failure...
> sorry... the servers rebooted fine (with fsck) and the problem
> of syslogd disappeared.
>
> I am compiling new kernels for some machines with the DDB option
> and preparing the shell for kill syslogd. I will use it next time.
OK
>
> This problem is not new for me and I could say it occurs
> every 30 or 60 days in our LAN (we have 13+ boxes 1.5.x/i386)
> and I have seen it since two+ years ago (versions 1.4 and
> maybe 1.3.x)
BTW, I suspect you have some very special setup. I have > 20
NetBSD servers (i386, alpha, sparc), some of them with very large
uptime (> 500 days) and the only time I've seen that was when someone
hit the 'scroll lock' on the console. Unlocking it unwedged syslogd.
Can you give more details about your setup (hardware, console type, software
using syslogd, syslogd.conf, etc ...)
To check softwares using syslogd you can use fstat: first find the
address of the unix socket:
rochebonne# fstat |grep syslog
root syslogd 109 root / 2 drwxr-xr-x 1024 r
root syslogd 109 wd / 2 drwxr-xr-x 1024 r
root syslogd 109 0 / 4358 crw-rw-rw- null rw
root syslogd 109 1 / 4358 crw-rw-rw- null rw
root syslogd 109 2 / 4358 crw-rw-rw- null rw
root syslogd 109 3* unix dgram c07cb940
root syslogd 109 4* internet6 dgram udp c0bde300 *:514
root syslogd 109 5 / 4354 crw------- console w
root syslogd 109 6 / 1150 -rw-r--r-- 17041 w
root syslogd 109 7 / 1150 -rw-r--r-- 17041 w
root syslogd 109 8 / 1104 -rw------- 26716 w
root syslogd 109 9 / 697 -rw------- 1275 w
root syslogd 109 10 / 1422 -rw------- 0 w
root syslogd 109 11 / 1277 -rw-r----- 0 w
root syslogd 109 12 / 1113 -rw------- 1329 w
root syslogd 109 13* internet dgram udp c06f2000 *:514
root syslogd 109 14 / 4362 crw------- klog r
in my case it's c07cb940. Now get a list of processes using it:
rochebonne# fstat |grep c07cb940
root perl 191 4* unix dgram c0bf2b00 <-> c07cb940
root mountd 164 4* unix dgram c0bd9480 <-> c07cb940
root lfs_cleanerd 134 4* unix dgram c0a42e80 <-> c07cb940
root lfs_cleanerd 133 4* unix dgram c0a42e80 <-> c07cb940
root syslogd 109 3* unix dgram c07cb940
--
Manuel Bouyer <bouyer@antioche.eu.org>
--