[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: system goes unresponsive
On Mon, Dec 17, 2012 at 4:59 AM, Steve Blinkhorn <steve%prd.co.uk@localhost>
> I have an i386 machine running NetBSD 4.0.1 that has run consistently
> for several years. There have been no recent changes to any part of
> the configuration: it provides the backbone of my local network, name
> service, file service etc. etc.
> A few days ago it started crashing, or so I thought, overnight. It
> was almost unresponsive, but not quite. It would issue a login
> prompt on a virtual terminal on its console, and echo the login but
> not issue the password prompt.
> Since we had been the subject of a substantial DoS attack from China,
> I assumed that was the problem, added a couple of extra rules on my
> router firewall, but next morning same problem.
> So I thought, maybe there'se some form of attack that's causing the
> system to run out of processes. So last night I left top(1) running on
> a virtual terminal. This morning there was the same problem, but top
> was still updating regularly and showing the system as essentially
> 100% idle, with ample free memory and swap space, and only 75
> processes (which is about baseline for this machine).
> I can't find any panic message in the logs, but from the absence of
> the normal rhythm of log entries, it seems that the problem occurs
> sometime around 0315, which strikes me as significant in terms of
> daily housekeeping.
> Help in diagnosing this problem would be much appreciated. I think
> it's something very basic that I just haven't run into as a problem
When it comes to older i386 hardware (I'm assuming older since you're
using 4.x), I'm pessimistic.
0315 is probably related to daily maintenance.
Machine hangs in NetBSD/i386 are overwhelmingly related to bad
hardware in my experience.
Once it's getting stressed by daily maintenance, it's giong bad.
Check the smart status on the hard drives (assuming you're using IDE drives):
atactl wdX smart status
Look for bad errors.
Check the dmesg. Do a memory test. It's probably related to memory or
a hard drive. Or maybe a marginal power supply.
Main Index |
Thread Index |