On Mon, Dec 17, 2012 at 4:59 AM, Steve Blinkhorn <steve%prd.co.uk@localhost>
wrote:
I have an i386 machine running NetBSD 4.0.1 that has run consistently
for several years. There have been no recent changes to any part of
the configuration: it provides the backbone of my local network, name
service, file service etc. etc.
A few days ago it started crashing, or so I thought, overnight. It
was almost unresponsive, but not quite. It would issue a login
prompt on a virtual terminal on its console, and echo the login but
not issue the password prompt.
Since we had been the subject of a substantial DoS attack from China,
I assumed that was the problem, added a couple of extra rules on my
router firewall, but next morning same problem.
So I thought, maybe there'se some form of attack that's causing the
system to run out of processes. So last night I left top(1) running on
a virtual terminal. This morning there was the same problem, but top
was still updating regularly and showing the system as essentially
100% idle, with ample free memory and swap space, and only 75
processes (which is about baseline for this machine).
I can't find any panic message in the logs, but from the absence of
the normal rhythm of log entries, it seems that the problem occurs
sometime around 0315, which strikes me as significant in terms of
daily housekeeping.
Help in diagnosing this problem would be much appreciated. I think
it's something very basic that I just haven't run into as a problem
before.
When it comes to older i386 hardware (I'm assuming older since you're
using 4.x), I'm pessimistic.
0315 is probably related to daily maintenance.
Machine hangs in NetBSD/i386 are overwhelmingly related to bad
hardware in my experience.
Once it's getting stressed by daily maintenance, it's giong bad.
Check the smart status on the hard drives (assuming you're using IDE drives):
atactl wdX smart status
Look for bad errors.
Check the dmesg. Do a memory test. It's probably related to memory or
a hard drive. Or maybe a marginal power supply.
Andy