Subject: Re: multiprocessor i386 1.6ZC system crash
To: None <lubos.vrbka@jh-inst.cas.cz>
From: Michael Hertrick <m.hertrick@neovera.com>
List: current-users
Date: 10/14/2003 08:30:14
I have experienced similar, if not the same behavior with both 1.6 and 
-current.  That kind of lock-up has only happened to me when there's 
either a disk-related problem or overheated processor, but I've never 
used the multiprocessor kernel. 

One time, I just had the RAID partitions misaligned on the disk(s):  It 
ran fine for a few weeks, but got worse as time went on and more of the 
drive was being used.  Then after a few weeks of putting-up with the 
intermittent lock-ups it got to the point where the parity re-write 
locked it up.  Needless to say, a new disklabel fixed her right up.  
Another time it was a bad IDE controller...It never hurts to keep a few 
around for testing.  And yet another time it was a dying HDD.

So, if you haven't already:

- check your disklabels.  even the slightest mistake could lead to this 
problem.
- check your dmesg for any nasty looking I/O messages during disk and 
controller initialization.
- eliminate one hardware component at a time.

Maybe that'll help you determine whether you need to update the kernel 
or not.  Then again, it might be easier to upgrade/downgrade before 
checking all that other stuff... s'up to you.

Good luck,
~Mike.


Lubos Vrbka wrote:

> hi guys,
>
> i've got -current running on 2 processor i386 (p3 800) machine. from 
> time to time the whole machine freezes and i can do nothing with it - 
> no keyboard response, no network response... the only thing i can do 
> is to reset... (hapenned to me twice today :o(, happens randomly ~once 
> per week).
>
> i inspected logs but found nothing. seems that the last thing my 
> computer was doing before crash was restarting syslogd.
> from my observations, the hangups happen only when i i'm working (i'm 
> running X from 1.6.1 stable) on the machine, mainly when accesing 
> harddrive. it's interesting that i ran several few hours long 
> calculations that use harddrive quite a lot, but no crash...
>
> when i used 1.6.1 stable and -current compiled without multiprocessor 
> support i didn't observe such behaviour, but i ran them only short 
> time, so maybe i just "missed" the crash...
>
> any hints? should i update my system again?
>
> please cc-me with your replies as i'm not subscribed to current-users. 
> thanks.
>
> regards,
> lubos
>
>
>