Subject: Re: kern/35008: viaide.c v1.35 sometimes fails horribly
To: None <gnats-bugs@NetBSD.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-bugs
Date: 11/13/2006 00:27:25
--9amGYk9869ThD9tj
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tue, Nov 07, 2006 at 02:40:00PM +0000, perry@piermont.com wrote:
> I'm running on an amd64 box with a viaide SATA controller. With ACPI
> not on in the kernel, both version 1.34 and version 1.35 of viaide.c
> lead to periodic failures to boot (perhaps one in every five times),
> with the driver spewing errors during boot and failing to read the
> disk.
> 
> However, this PR is about the behavior with ACPI turned on.
> 
> Version 1.35 leads to failure about one in every five to ten reboots.
> I get lots of messages, most of which scroll off the screen,
> preventing me from writing them down. :(
> 
> This is what was left on the screen that I could type in by hand:
> 
> [...]
> : <ST506>
> wd0: drive supports 1-sector PIO transfers, chs addressing
> [note: this is a modern drive and does fine most reboots.]
> wd0: 69632 KB, 1024 cyl, 8 head, 17 sec, 512 bytes/sect x 139264 sectors
> [that's totally wrong of course, and it works on most boots.]
> [then we have a bunch of unimportant junk, and then...]
> wd0(viaide1:0:0): using PIO mode 0
> viaide1:0:0: wait timed out
> wd0d: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
> wd0: soft error (corrected)
> wd0: mbr partition exceeds disk size
> wd0: mbr partition exceeds disk size
> wd0: mbr partition exceeds disk size
> wd0: mbr partition exceeds disk size
> boot device: <unknown>
> root device:

OK, it looks like the drive rejected the IDENTIFY command. Maybe is
needs a reset (currently the sata probe resets only the interface, not
the drive itself, while the old probe resets the drives). Could you try 
the attached patch ?

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--

--9amGYk9869ThD9tj
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff

Index: wdc.c
===================================================================
RCS file: /cvsroot/src/sys/dev/ic/wdc.c,v
retrieving revision 1.240
diff -u -u -r1.240 wdc.c
--- wdc.c	25 Oct 2006 20:14:00 -0000	1.240
+++ wdc.c	12 Nov 2006 23:24:14 -0000
@@ -238,7 +238,13 @@
 	bus_space_write_4(wdr->sata_iot, wdr->sata_control, 0, scontrol);
 
 	tsleep(wdr, PRIBIO, "sataup", mstohz(50));
-	sstatus = bus_space_read_4(wdr->sata_iot, wdr->sata_status, 0);
+	/* wait up to 1s for device to come up */
+	for (i = 0; i < 100; i++) {
+		sstatus = bus_space_read_4(wdr->sata_iot, wdr->sata_status, 0);
+		if ((sstatus & SStatus_DET_mask) == SStatus_DET_DEV)
+			break;
+		tsleep(wdr, PRIBIO, "sataup", mstohz(10));
+	}
 
 	switch (sstatus & SStatus_DET_mask) {
 	case SStatus_DET_NODEV:
@@ -286,6 +292,12 @@
 		aprint_normal("%s: port %d: device present, speed: %s\n",
 		    chp->ch_atac->atac_dev.dv_xname, chp->ch_channel,
 		    sata_speed(sstatus));
+		/*
+		 * issue a reset in case only the interface part of the drive
+		 * is up
+		 */
+		if (wdcreset(chp, RESET_SLEEP) != 0)
+			chp->ch_drive[0].drive_flags = 0;
 		break;
 
 	default:

--9amGYk9869ThD9tj--