NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-i386/41706: disk subsystem unresponsive after (recovered) disk failure



On Sun, Jul 12, 2009 at 03:05:00PM +0000, bad%bsd.de@localhost wrote:
> >Description:
>       
> sd1 failed on the above system a couple of days ago.  What I could see
> on the console were the messages from ahc1 being reset.  sd1 became
> unready and would no longer respond positivly to a TEST UNIT READY command
> (firmware diagnostic failure given as the reason).
> 
> The system sat there for 2 more days without further kernel messages.
> Pressing return on the console would produce a new login prompt from getty.
> The system was pingable and did accept TCP connections (e.g. to the SSH port).
> But no disk IO would happen and no error messages were printed.
> IOW. the block IO subsystem seems to have been deadlocked at a high level.

This is an issue with timeouts in the ahc driver (I found with a tape drive
where some mt or chio operation would take too long). I have a patch for this
(on a powered down system, I'll have a look tomorow).
from memory, the workaround was to not send BDR message and directly do a
bus reset.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index