netbsd-bugs: kern/23392: Wedges with cmdide on Silicon Image 3112 on IQ80321

Subject: kern/23392: Wedges with cmdide on Silicon Image 3112 on IQ80321
To: None <gnats-bugs@gnats.netbsd.org>
From: None <briggs@ninthwonder.com>
List: netbsd-bugs
Date: 11/08/2003 02:47:20
>Number:         23392
>Category:       kern
>Synopsis:       Wedges with cmdide on Silicon Image 3112 on IQ80321
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Nov 08 02:48:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Allen Briggs
>Release:        NetBSD current (20031108-0100 UTC)
>Organization:
                  Use NetBSD!  http://www.netbsd.org/
>Environment:
NetBSD  1.6ZE NetBSD 1.6ZE (IQ80321) #0: Fri Nov  7 17:05:49 EST 2003
Machine arch: arm
Machine: evbarm
>Description:
	Using a Silicon Image 3112 adapter with one Maxtor 6Y080M0 SATA
	disk attached, I get hard hangs (like can't break into DDB) when
	I try to access the disk after boot.  For example, if I run
	'disklabel wd0', the machine locks up solid.

	cmdide0 at pci0 dev 6 function 0
	cmdide0: Silicon Image SATALink 3112 (rev. 0x01)
	cmdide0: bus-master DMA support present
	cmdide0: primary channel wired to native-PCI mode
	cmdide0: using irq 29 for native-PCI interrupt
	atabus0 at cmdide0 channel 0
	cmdide0: secondary channel wired to native-PCI mode
	atabus1 at cmdide0 channel 1 
	wd0 at atabus0 drive 0: <Maxtor 6Y080M0>
	wd0: drive supports 16-sector PIO transfers, LBA addressing
	wd0: 78167 MB, 158816 cyl, 16 head, 63 sec, 512 bytes/sect x 160086528 sectors
	wd0: 32-bit data port
	wd0: drive supports PIO mode 4, Ultra-DMA mode 6 (Ultra/133)
	wd0(cmdide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA data transfers)

	If I drop into DDB before that point and set wdcdebug_mask to 0xff,
	it works fine.  If I set it to anything except 2, in fact, it
	seems to work.  When I use 2, I get:

		wdc_exec_xfer 0xc12e0000 channel 0 drive 0
		wdcstart from wdc_exec_xfer, flags 0x0
		wdcstart: xfer 0xc12e0000 channel 0 drive 0
		wdc_exec_xfer 0xc12e0000 channel 0 drive 0
		wdcstart from wdc_exec_xfer, flags 0x0
		wdcstart: xfer 0xc12e0000 channel 0 drive 0
		<hang>

	This controller still has the 31s timeout on the empty channel,
	for what that's worth.

	The same controller/disk seem to work OK in an i386 box, but
	the conditions are a lot different.  The evbarm board has
	basically nothing else going on (4 kernel threads & init on
	evbarm to 12 on this i386 box).

	It does, however, work fine on a kernel from sources updated with:
	$ cvs -q up -PdA -D"2003/10/08 06:00:00"
	(just before the ATA mid-layer commits from bouyer)

	pciide0 at pci0 dev 6 function 0
	pciide0: Silicon Image SATALink 3112 (rev. 0x01)
	pciide0: bus-master DMA support present
	pciide0: primary channel wired to native-PCI mode
	pciide0: using irq 29 for native-PCI interrupt
	pciide0: secondary channel wired to native-PCI mode
	clock: hz=100 stathz=0 profhz=0
	wd0 at pciide0 channel 0 drive 0: <Maxtor 6Y080M0>
	wd0: drive supports 16-sector PIO transfers, LBA addressing
	wd0: 78167 MB, 158816 cyl, 16 head, 63 sec, 512 bytes/sect x 160086528 sectors
	wd0: 32-bit data port
	wd0: drive supports PIO mode 4, Ultra-DMA mode 6 (Ultra/133)
	wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA data transfers)

	This suggests to me that there is some condition tickled by the
	introduction of the mid-layer.  It's not clear, however, if it's
	a platform-specific error, a timing error, or what.

	A Promise PATA (Ultra100TX2) seems to work fine with old and new
	kernels--although it appears to want to drop back to Ultra/33
	when I request a disklabel (on older and current kernels).
>How-To-Repeat:
	See above.
>Fix:
	Unknown.
>Release-Note:
>Audit-Trail:
>Unformatted: