Subject: Re: Strange problem with syslogd on NetBSD-release-1-5
To: None <netbsd-users@netbsd.org>
From: Jim Breton <jamesb-netbsd@alongtheway.com>
List: netbsd-users
Date: 08/04/2001 04:11:08
OK, if anyone is still interested in this. :P My syslogd has once again
stopped logging, and I have a ktrace.  syslogd is taking up almost all
my CPU.  I wasn't thinking to check whether that was the case last time,
but the other symptoms appear to be exactly the same as the case I
mention below.  I am now running the released NetBSD 1.5.1 on i386.
System is a Pentium 100 with 32 MB of RAM.  Custom-built kernel (config
file can be supplied).  The system has been up for about 19 hours, but
that doesn't seem to be related... this can happen minutes after
booting; conversely, prior to my last reboot the machine had been up for
19 days without this ever occurring at all.

Anyways, here are some data:

load averages:  1.11,  1.43,  1.78

From top:
112 root      64    0   120K  488K run     327:57 78.52% 78.52% syslogd

ktrace:

   112 syslogd  EMUL  "netbsd"
   112 syslogd  RET   poll 1
   112 syslogd  CALL  poll(0x804f180,0x4,0xffffffff)
   112 syslogd  RET   poll 1
   112 syslogd  CALL  poll(0x804f180,0x4,0xffffffff)
   112 syslogd  RET   poll 1
   112 syslogd  CALL  poll(0x804f180,0x4,0xffffffff)
   112 syslogd  RET   poll 1
   <snip>
   112 syslogd  CALL  poll(0x804f180,0x4,0xffffffff)
   112 syslogd  RET   poll 1
   112 syslogd  CALL  poll(0x804f180,0x4,0xffffffff)
   112 syslogd  RET   poll 1
   112 syslogd  CALL  poll(0x804f180,0x4,0xffffffff)
   <continuous lines like those shown above>

My original message is included below.

Anyone have any ideas?


On Tue, Jul 10, 2001 at 05:35:36AM +0000, Jim Breton wrote:
> I am running on NetBSD-release-1-5 which is up to date as of about a
> week ago.  Platform is i386, computer is a Pentium 100 with 32 MB of
> RAM.  It runs a Squid and Junkbuster proxy (from pkgsrc), as well as a
> dnscache process used locally (bound to 127.0.0.1).  The only time it is
> under any real load is when it runs its own cron jobs (updatedb, etc.).
> 
> This problem has happened to me 4 or 5 times now, and I can't see any
> consistency to its interval or what could be causing it.
> 
> Here is what happens: the machine normally receives syslog messages over
> udp from several other machines on the LAN.  It is configured to write
> these to disk (like usual), as well as write them to a line printer on
> /dev/lpt0:
> 
> auth,authpriv.info					/dev/lpt0
> 
> It also sends them to another machine on the LAN (I plan on turning this
> off at some point); however that machine is not always up:
> 
> auth,authpriv.info                                      @192.168.0.150
> 
> Anyway, at some seemingly-random point, the NetBSD machine will stop
> logging these remote messages at all, as well as stop the logging of
> its local messages to the printer (as far as I can tell it still logs
> local ones to disk, but I'm not sure if this is the case every time).
> 
> I have verified (using tcpdump) that the computer is indeed receiving
> these log messages.
> 
> Restarting syslogd makes everything work fine again, until the next time
> it happens.
> 
> Any ideas on what might cause this?  I can't imagine it could be the
> fact that that other log server is down, for two reasons: 1) AFAIK, log
> messages are one-way traffic, so this box doesn't even know the other
> one is down; 2) this has happened at least twice, within minutes of each
> other, while that other machine _was_ running.
> 
> Any insight appreciated.  Thanks!