Subject: Re: IDE driver misfeature?
To: Jukka Marin <jmarin@embedtronics.fi>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 08/04/2004 10:45:23
On Wed, Aug 04, 2004 at 08:51:09AM +0300, Jukka Marin wrote:
> Hello,
> 
> Running 2.0beta:
> 
> Is it a feature of the new IDE driver that a failing disk can lock up
> the whole computer?  It feels like the spl level stays very high during
> a disk operation and if a drive is having problems reading a block (for
> example), the computer is (almost) dead until the command is completed.
> 
> I had complete freezes on my desktop machine recently.  Twice the system
> recovered after a wait of tens of seconds.  Once I had to power-cycle
> the system to bring the disk back alive (even bios blocked when trying
> to identify it (a seagate barracuda)).
> 
> On a laptop, when the disk and driver were retrying reads, the machine
> didn't even respong to ping.  Instead, all the ping packets were
> received by the laptop - and when the IDE operation was completed, the
> machine sent reply packets to all pings at once.  So it seems network
> reception works during the IDE operations, but the network stack or
> mbuf system or the transmit side are blocked.
> 
> I don't think I've ever seen this under 1.6 - a failing disk would
> prevent or slow down accesses to itself, but it didn't bring the
> whole system to halt.  Maybe something happened to the driver when
> it was split into different chipset drivers?

It occured on 1.6 too. It should be better in 2.0, now that there is a kernel
thread to handle recovery, instead of busy-waits in the interrupt routine
(the network stack should still be functionnal, for example).
I admit I didn't have a failing drive to test this recently :)

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--