NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: system goes unresponsive



On 2012-12-18 12:49, Steve Blinkhorn wrote:
Just to add that commenting out the crontab entries for daily
maintenance worked, in that the system did not go unresponsive last
night.   Not a long-term solution, obviously, so any further insights
(e.g., which bits of daily maintenance are likely to stress the
system?) might help save me some hours of toil.

The only thing I have experienced is that as the daily job goes through all the file system searching for old temp files and whatnot, it fills all the memory with disk cache, more or less paging out all other processes, which then will have to start by page faulting a lot and slowly reclaim memory back from the disk cache. I say slowly, because the system seems to have a preference to reusing memory allocated to processes instead of actually grabbing back disk cache memory.

I normally tweak my vm variables violently, to improve my life.
Not sure that others agree with me, but I've found life much more bearable when I do this. Otherwise my experience is somewhat like yours.

        Johnny


--
Steve Blinkhorn <steve%prd.co.uk@localhost>

You wrote:

On Mon, Dec 17, 2012 at 4:59 AM, Steve Blinkhorn <steve%prd.co.uk@localhost> 
wrote:
I have an i386 machine running NetBSD 4.0.1 that has run consistently
for several years.   There have been no recent changes to any part of
the configuration: it provides the backbone of my local network, name
service, file service etc. etc.

A few days ago it started crashing, or so I thought, overnight.   It
was almost unresponsive, but not quite.   It would issue a login
prompt on a virtual terminal on its console, and echo the login but
not issue the password prompt.

Since we had been the subject of a substantial DoS attack from China,
I assumed that was the problem, added a couple of extra rules on my
router firewall, but next morning same problem.

So I thought, maybe there'se some form of attack that's causing the
system to run out of processes.   So last night I left top(1) running on
a virtual terminal.   This morning there was the same problem, but top
was still updating regularly and showing the system as essentially
100% idle, with ample free memory and swap space, and only 75
processes (which is about baseline for this machine).

I can't find any panic message in the logs, but from the absence of
the normal rhythm of log entries, it seems that the problem occurs
sometime around 0315, which strikes me as significant in terms of
daily housekeeping.

Help in diagnosing this problem would be much appreciated.   I think
it's something very basic that I just haven't run into as a problem
before.

When it comes to older i386 hardware (I'm assuming older since you're
using 4.x), I'm pessimistic.

0315 is probably related to daily maintenance.

Machine hangs in NetBSD/i386 are overwhelmingly related to bad
hardware in my experience.

Once it's getting stressed by daily maintenance, it's giong bad.

Check the smart status on the hard drives (assuming you're using IDE drives):

atactl wdX smart status

Look for bad errors.

Check the dmesg. Do a memory test. It's probably related to memory or
a hard drive. Or maybe a marginal power supply.

Andy





--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt%softjar.se@localhost             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


Home | Main Index | Thread Index | Old Index