Subject: Re: kern/26873: wd driver error recovery broken
To: None <kardel@Orcus.project.Acrys.COM>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-bugs
Date: 09/11/2004 19:49:47
--KsGdsel6WgEHnImy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tue, Sep 07, 2004 at 09:48:08AM +0200, kardel@Orcus.project.Acrys.COM wrote:
> [...]
> More recent kernels (20040831) exhibit a really problematic behaviour. Error
> recovers seems to stop after :
> wd4: transfer error, downgrading to Ultra-DMA mode 4
> wd4(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)
> 
> no more disk activity, flushing the disk cache fails on reboot. disk accesses
> (like fsck) hang. file system eventually hangs with vnlocks - basically this
> leads to a slowly locking up system.

Don't you see 2 more lines before the hang, like:
wd2: transfer error, downgrading to Ultra-DMA mode 3
wd2(pdcide0:0:0): using PIO mode 4, Ultra-DMA mode 3 (using DMA data transfers)
wd2d: error reading fsbn 0 (wd2 bn 0; cn 0 tn 0 sn 0), retrying
wd2: (interface CRC error)

this is what I see on my testbed (I see the hang too).
Anyway please try the attached patch, it fixes the problem for me.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--

--KsGdsel6WgEHnImy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff

Index: ata.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ata/ata.c,v
retrieving revision 1.59
diff -u -r1.59 ata.c
--- ata.c	21 Aug 2004 00:48:32 -0000	1.59
+++ ata.c	11 Sep 2004 17:49:13 -0000
@@ -864,6 +864,7 @@
 	if ((flags & (AT_POLL | AT_WAIT)) == 0) {
 		if (chp->ch_flags & ATACH_TH_RESET) {
 			/* No need to schedule a reset more than one time. */
+			chp->ch_queue->queue_freeze--;
 			return;
 		}
 		chp->ch_flags |= ATACH_TH_RESET;

--KsGdsel6WgEHnImy--