NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-i386/41706: disk subsystem unresponsive after (recovered) disk failure



The following reply was made to PR port-i386/41706; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-i386-maintainer%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
        netbsd-bugs%NetBSD.org@localhost
Subject: Re: port-i386/41706: disk subsystem unresponsive after (recovered) 
disk failure
Date: Tue, 28 Jul 2009 21:58:59 +0200

 On Sun, Jul 12, 2009 at 03:05:00PM +0000, bad%bsd.de@localhost wrote:
 > >Description:
 >      
 > sd1 failed on the above system a couple of days ago.  What I could see
 > on the console were the messages from ahc1 being reset.  sd1 became
 > unready and would no longer respond positivly to a TEST UNIT READY command
 > (firmware diagnostic failure given as the reason).
 > 
 > The system sat there for 2 more days without further kernel messages.
 > Pressing return on the console would produce a new login prompt from getty.
 > The system was pingable and did accept TCP connections (e.g. to the SSH 
 > port).
 > But no disk IO would happen and no error messages were printed.
 > IOW. the block IO subsystem seems to have been deadlocked at a high level.
 
 This is an issue with timeouts in the ahc driver (I found with a tape drive
 where some mt or chio operation would take too long). I have a patch for this
 (on a powered down system, I'll have a look tomorow).
 from memory, the workaround was to not send BDR message and directly do a
 bus reset.
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index