Subject: kern/28255: pciide UDMA mode immediately downgraded to PIO on CRC error
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <tsutsui@ceres.dti.ne.jp>
List: netbsd-bugs
Date: 11/12/2004 15:20:01
>Number:         28255
>Category:       kern
>Synopsis:       pciide UDMA mode immediately downgraded to PIO on CRC error
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 12 15:20:00 +0000 2004
>Originator:     Izumi Tsutsui
>Release:        NetBSD 2.99.10
>Organization:
>Environment:
System: NetBSD mirage 2.99.10 NetBSD 2.99.10 (MIRAGE) #322: \
Fri Nov 12 23:24:42 JST 2004 \
tsutsui@mirage:/usr/src/sys/arch/i386/compile/MIRAGE i386
Architecture: i386
Machine: i386
>Description:
According to sys/dev/ata/atavar.h, UDMA mode downgrade will occur
if there are NERRS_MAX (currently 4) CRC errors in at most NXFER
(currently 4000) xfers.
In sys/dev/ic/wdc.c:wdc_drvprove(), n_dmaerrs (error counter)
is initilized to (NERRS_MAX-1), so downgrade happens if one CRC error
in first NXFER xfers.
But with current ata code only one CRC error causes downgrade
to PIO mode from UDMA mode 4 even if after >NXFER xfers.

>How-To-Repeat:
Here is logs on my PC:
---
Nov  5 00:39:51 mirage /netbsd: wd1a: error reading fsbn 24300752 of 24300752-24300879 (wd1 bn 139644112; cn 138535 tn 13 sn 13), retrying
Nov  5 00:39:51 mirage /netbsd: wd1: (aborted command, interface CRC error)
Nov  5 00:39:51 mirage /netbsd: wd1: soft error (corrected)
Nov  5 00:39:51 mirage /netbsd: wd1: transfer error, downgrading to Ultra-DMA mode 4
Nov  5 00:39:51 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)
Nov  5 00:39:51 mirage /netbsd: wd1: transfer error, downgrading to Ultra-DMA mode 3
Nov  5 00:39:51 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 3 (using DMA data transfers)
Nov  5 00:39:51 mirage /netbsd: wd1: transfer error, downgrading to Ultra-DMA mode 2
Nov  5 00:39:51 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA data transfers)
Nov  5 00:39:51 mirage /netbsd: wd1: transfer error, downgrading to Ultra-DMA mode 1
Nov  5 00:39:51 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA data transfers)
Nov  5 00:39:51 mirage /netbsd: wd1: transfer error, downgrading to PIO mode 4
Nov  5 00:39:51 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4
---

log with a debug printf() to check drvp->n_dmaerrs and drvp->n_xfers
in ata_dmaerr():
---
Nov  6 05:30:20 mirage /netbsd: ata_dmaerr(): n_dmaerrs = 4, n_xfers = 4001
Nov  6 05:30:20 mirage /netbsd: wd1a: error reading fsbn 36322326 of 36322326-36322327 (wd1 bn 151665686; cn 150461 tn 15 sn 53), retrying
Nov  6 05:30:20 mirage /netbsd: wd1: (aborted command, interface CRC error)
Nov  6 05:30:20 mirage /netbsd: ata_dmaerr(): n_dmaerrs = 2, n_xfers = 2
Nov  6 05:30:20 mirage /netbsd: wd1: soft error (corrected)
Nov  6 05:30:20 mirage /netbsd: ata_dmaerr(): n_dmaerrs = 3, n_xfers = 3
Nov  6 05:30:20 mirage /netbsd: ata_dmaerr(): n_dmaerrs = 4, n_xfers = 4
Nov  6 05:30:20 mirage /netbsd: wd1: transfer error, downgrading to Ultra-DMA mode 4
Nov  6 05:30:20 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)
Nov  6 05:30:20 mirage /netbsd: ata_dmaerr(): n_dmaerrs = 4, n_xfers = 1
Nov  6 05:30:20 mirage /netbsd: wd1: transfer error, downgrading to Ultra-DMA mode 3
Nov  6 05:30:20 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 3 (using DMA data transfers)
Nov  6 05:30:20 mirage /netbsd: ata_dmaerr(): n_dmaerrs = 4, n_xfers = 1
Nov  6 05:30:20 mirage /netbsd: wd1: transfer error, downgrading to Ultra-DMA mode 2
Nov  6 05:30:20 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA data transfers)
Nov  6 05:30:20 mirage /netbsd: ata_dmaerr(): n_dmaerrs = 4, n_xfers = 1
Nov  6 05:30:20 mirage /netbsd: wd1: transfer error, downgrading to Ultra-DMA mode 1
Nov  6 05:30:20 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA data transfers)
Nov  6 05:30:20 mirage /netbsd: ata_dmaerr(): n_dmaerrs = 4, n_xfers = 1
Nov  6 05:30:20 mirage /netbsd: wd1: transfer error, downgrading to PIO mode 4
Nov  6 05:30:20 mirage /netbsd: wd1(hptide0:0:0): using PIO mode 4
---

>Fix:
sys/dev/ata/ata_wdc.c:wdc_ata_bio_intr() checks ata_bio->r_error
(which contains value of the error register), but the member is updated
only on ata_bio->error == ERROR case and is not cleared on each
transfer. Once a CRC error occurs and ata_bio->r_error is set,
it still contains the error code even after the next transfers,
so ata_dmaerr() is called repeatedly and then transfer mode
is downgraded to PIO.

Index: ata_wdc.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ata/ata_wdc.c,v
retrieving revision 1.76
diff -u -r1.76 ata_wdc.c
--- ata_wdc.c	28 Oct 2004 07:07:39 -0000	1.76
+++ ata_wdc.c	12 Nov 2004 14:43:36 -0000
@@ -612,8 +612,10 @@
 				drv_err = WDC_ATA_ERR;
 			}
 		}
-		if (ata_bio->r_error & WDCE_CRC)
-			ata_dmaerr(drvp, (xfer->c_flags & C_POLL) ? AT_POLL : 0);
+		if (ata_bio->error == ERROR &&
+		    (ata_bio->r_error & WDCE_CRC) != 0)
+			ata_dmaerr(drvp,
+			    (xfer->c_flags & C_POLL) ? AT_POLL : 0);
 		if (drv_err != WDC_ATA_ERR)
 			goto end;
 	}