Subject: lack of pciide transfer alignment checking causes crash
To: None <tech-kern@netbsd.org>
From: Erik E. Fair <fair@netbsd.org>
List: tech-kern
Date: 06/25/2005 04:39:08
I've been having such fun for the past two weeks trying to get NetBSD 2.0.2
to work reliably on an interesting system:

total memory = 188 MB
avail memory = 176 MB
BIOS32 rev. 0 found at 0xfd780
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Cyrix MMX-enhanced MediaGX (GXm) (586-class), 267.28 MHz, id 0x540
cpu0: features 808131<FPU,TSC,MSR,CX8>
cpu0: features 808131<CMOV,MMX>
cpu0: I-cache 12K uOp cache 8-way
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Cyrix Corporation MediaGX Built-in PCI Host Controller (rev. 0x00)

[...]

pcib0 at pci0 dev 18 function 0
pcib0: Cyrix Corporation Cx5530 I/O Companion Multi-Function South Bridge (rev. 0x00)
Cyrix Corporation Cx5530 I/O Companion (SMI Status and ACPI Timer) (miscellaneous bridge)
at pci0 dev 18 function 1 not configured
geodeide0 at pci0 dev 18 function 2
geodeide0: AMD Geode CX5530 IDE controller (rev. 0x00)
geodeide0: bus-master DMA support present
geodeide0: primary channel wired to compatibility mode
geodeide0: primary channel interrupting at irq 14
atabus0 at geodeide0 channel 0
geodeide0: secondary channel wired to compatibility mode
geodeide0: secondary channel interrupting at irq 15
atabus1 at geodeide0 channel 1
Cyrix Corporation Cx5530 I/O Companion (XpressAUDIO) (audio multimedia) at pci0 dev 18
function 3 not configured
vga1 at pci0 dev 18 function 4: Cyrix Corporation Cx5530 I/O Companion (Video Controller) (rev. 0x00)

The Soekris Engineering web site summarizes a bunch of the bugs:

	http://www.soekris.com/Issue0003.htm

except for a couple of problems with NetBSD, which Soekris thinks are fixed.

They're not.

There is a more complete description of the IDE DMA problems in:

	http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/pci/ide_pci.c.html

search for "cyrix"; It seems clear that both the Cyrix original and the NSC
followon chip share the same bugs (same core, same bugs).

I have committed a fix to -current (sys/dev/pci/geodeide.c 1.9) for the UDMA
mode 2 problem (answer: cap at mode 1; that change was easy - the code is
nice and clean on that point) which should be pulled up to 2.0 (I'll request
that in the morning after I get some sleep).

The problem I want help with here on tech-kern is precisely where to enforce
the cache-line (16 byte) alignment requirement for IDE DMA transfers. I have
experimentally determined that any violation of that rule will cause the system
to lock up hard, requiring a Power-cycle to reset. User mode access to the
raw device, which, of course, invokes kern_physio().

This is probably why the NetBSD install failed at the disklabelling step (I
subsequently installed NetBSD on an IDE disk on another system, and then moved
the disk over).

The only reason that NetBSD runs at all on this box is that the typical FFS
filesystem I/O requests are apparently aligned properly.

I think there are two basic choices for handling a misaligned transfer request:

1. return an error (e.g. EIO).

2. use PIO mode for misaligned transfers.

I think that sys/dev/pci/pciide_common.c should have facilities added to it
to enforce alignment (I didn't see any at a minimum, and I think it would be
nice if it handled them gracefully by degrading to PIO mode.

Evil clock issues in a forthcoming missive.

Comments?

	Erik <fair@netbsd.org>