[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-i386/41706: disk subsystem unresponsive after (recovered) disk failure
The following reply was made to PR port-i386/41706; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Cc: port-i386-maintainer%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
Subject: Re: port-i386/41706: disk subsystem unresponsive after (recovered)
Date: Tue, 28 Jul 2009 21:58:59 +0200
On Sun, Jul 12, 2009 at 03:05:00PM +0000, bad%bsd.de@localhost wrote:
> sd1 failed on the above system a couple of days ago. What I could see
> on the console were the messages from ahc1 being reset. sd1 became
> unready and would no longer respond positivly to a TEST UNIT READY command
> (firmware diagnostic failure given as the reason).
> The system sat there for 2 more days without further kernel messages.
> Pressing return on the console would produce a new login prompt from getty.
> The system was pingable and did accept TCP connections (e.g. to the SSH
> But no disk IO would happen and no error messages were printed.
> IOW. the block IO subsystem seems to have been deadlocked at a high level.
This is an issue with timeouts in the ahc driver (I found with a tape drive
where some mt or chio operation would take too long). I have a patch for this
(on a powered down system, I'll have a look tomorow).
from memory, the workaround was to not send BDR message and directly do a
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
Main Index |
Thread Index |