Subject: Re: kern/35008: viaide.c v1.35 sometimes fails horribly
To: None <bouyer@NetBSD.org, gnats-admin@netbsd.org,>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-bugs
Date: 11/12/2006 23:30:04
The following reply was made to PR kern/35008; it has been noted by GNATS.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: kern/35008: viaide.c v1.35 sometimes fails horribly
Date: Mon, 13 Nov 2006 00:27:25 +0100

 --9amGYk9869ThD9tj
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 On Tue, Nov 07, 2006 at 02:40:00PM +0000, perry@piermont.com wrote:
 > I'm running on an amd64 box with a viaide SATA controller. With ACPI
 > not on in the kernel, both version 1.34 and version 1.35 of viaide.c
 > lead to periodic failures to boot (perhaps one in every five times),
 > with the driver spewing errors during boot and failing to read the
 > disk.
 > 
 > However, this PR is about the behavior with ACPI turned on.
 > 
 > Version 1.35 leads to failure about one in every five to ten reboots.
 > I get lots of messages, most of which scroll off the screen,
 > preventing me from writing them down. :(
 > 
 > This is what was left on the screen that I could type in by hand:
 > 
 > [...]
 > : <ST506>
 > wd0: drive supports 1-sector PIO transfers, chs addressing
 > [note: this is a modern drive and does fine most reboots.]
 > wd0: 69632 KB, 1024 cyl, 8 head, 17 sec, 512 bytes/sect x 139264 sectors
 > [that's totally wrong of course, and it works on most boots.]
 > [then we have a bunch of unimportant junk, and then...]
 > wd0(viaide1:0:0): using PIO mode 0
 > viaide1:0:0: wait timed out
 > wd0d: device timeout reading fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0), retrying
 > wd0: soft error (corrected)
 > wd0: mbr partition exceeds disk size
 > wd0: mbr partition exceeds disk size
 > wd0: mbr partition exceeds disk size
 > wd0: mbr partition exceeds disk size
 > boot device: <unknown>
 > root device:
 
 OK, it looks like the drive rejected the IDENTIFY command. Maybe is
 needs a reset (currently the sata probe resets only the interface, not
 the drive itself, while the old probe resets the drives). Could you try 
 the attached patch ?
 
 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 
 --9amGYk9869ThD9tj
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=diff
 
 Index: wdc.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ic/wdc.c,v
 retrieving revision 1.240
 diff -u -u -r1.240 wdc.c
 --- wdc.c	25 Oct 2006 20:14:00 -0000	1.240
 +++ wdc.c	12 Nov 2006 23:24:14 -0000
 @@ -238,7 +238,13 @@
  	bus_space_write_4(wdr->sata_iot, wdr->sata_control, 0, scontrol);
  
  	tsleep(wdr, PRIBIO, "sataup", mstohz(50));
 -	sstatus = bus_space_read_4(wdr->sata_iot, wdr->sata_status, 0);
 +	/* wait up to 1s for device to come up */
 +	for (i = 0; i < 100; i++) {
 +		sstatus = bus_space_read_4(wdr->sata_iot, wdr->sata_status, 0);
 +		if ((sstatus & SStatus_DET_mask) == SStatus_DET_DEV)
 +			break;
 +		tsleep(wdr, PRIBIO, "sataup", mstohz(10));
 +	}
  
  	switch (sstatus & SStatus_DET_mask) {
  	case SStatus_DET_NODEV:
 @@ -286,6 +292,12 @@
  		aprint_normal("%s: port %d: device present, speed: %s\n",
  		    chp->ch_atac->atac_dev.dv_xname, chp->ch_channel,
  		    sata_speed(sstatus));
 +		/*
 +		 * issue a reset in case only the interface part of the drive
 +		 * is up
 +		 */
 +		if (wdcreset(chp, RESET_SLEEP) != 0)
 +			chp->ch_drive[0].drive_flags = 0;
  		break;
  
  	default:
 
 --9amGYk9869ThD9tj--