Subject: WDC/ATA PCMCIA and CardBus problems only with flash (i.e., not w/ microdrive)
To: None <tech-kern@netbsd.org>
From: Tad Hunt <tad@entrisphere.com>
List: tech-kern
Date: 03/08/2002 15:37:53
I have been using IBM microdrives in both PCMCIA (on our mpc8260
port) and CardBus slots (on the i386 development machines) without
any trouble for months and months.  It works wonderfully.

Now, we've begun switching to compact flash flash ATA disks (instead
of compact flash spinning ATA disks), because we don't want any
moving parts. (The flash disks I have tried are listed at the end)

I've tried a couple of different PC Card and Compact flash flash
[sic] disks (listed below).

The trouble I'm having is that I'm receiving a disk interrupt that
the wdc driver is not expecting (WDCF_IRQ_WAIT is not set), which
results in the interrupt never being cleared, thus getting stuck
in an infinite interrupt loop.

I have verified this infinite interrupt loop is without any doubt
the problem that is occurring on the mpc8260 board.  And I am
assuming this is why the NetBSD-i386 host hangs as well (though I
haven't instrumented it to check, the behavior is identical).

The below hack is enough to acknowledge the disk interrupt, so we
can continue happily along.  It doesn't appear to cause any data
lossage, nor does the wdc driver get confused.

------------------------------------------------------------------------------
	intrhandler()
	{
		handled = 0;
		for(ih = handlers_for_this_irq; ih != nil; ih = ih->next) {
			handled += ih->handle(ih->arg);
			if(handled)
				break;
		}
		if(!handled && irq == WDC_DISK_IRQ) {
			ih = find_ata_disk_ih();
			wdcfix(ih->arg);
		}
	}

------------------------------------------------------------------------------

	void
	wdcfix(void *arg)
	{
		struct channel_softc *chp;

		chp = arg;

		if(chp->ch_flags & WDCF_IRQ_WAIT)
			return;

		wdcwait(chp, 0, 0, 0);
		log(LOG_ERR, "%s: read status %02x\n",
			chp->wdc->sc_dev.dv_xname, chp->ch_status);
	}
------------------------------------------------------------------------------

The hang is very timing related.  It tends to occur much more
frequently during heavy interrupt activity.  It doesn't seem to be
strongly correlated to disk activity alone.

On our mpc8260 device, we have the PCMCIA controller, and two 8
port UARTs interrupting on the same IRQ.  It seems to take UART
activity coupled with disk activity to cause the interrupt problem.
The disk alone seems to run fine.

As soon as I start heavy uart activity, I tend to see several log
messages per second.  Acknowledging the interrupt from the ATA
drive is enough to make the interrupt go away, and elicits no
complaints from the ATA or WDC drivers.

Here are the steps in my testing:

1) start a dd if=/dev/r${disk}${rawpart} of=/dev/null bs=512&

2) let this run long enough to stabilize:
	If I don't move onto step 3, this tends to run without
	the log messages appearing.

3) start a file transfer on one of the uarts:
	Causes approximately 3 unexpected ATA interrupts per second

The NetBSD-i386 box tends to hang shortly after #1.  However, I
don't know much about how it is configured or what else may be
sharing the disk interrupt.

Obviously, the above hack is quite a hack, but I have spent several
days tracking this problem down and am at a loss for how to proceed.

I tried updating the WDC and ATA drivers to NetBSD-current (as of
last night), but the problem did not go away.

Has anyone seen anything like this before, or have some suggestions on
what I could do to narrow the problem down?

-Tad

------------------------------------------------------------------------------


1) SMART Modular Technologies ATA PC Card (model SM9FLAPC512M1)

   NetBSD identifies this as follows:

	wdc0 at pcmcia0 function 0 port 0x15000000-0x1500000f
	wd0 at wdc0 channel 0 drive 0: <SMART ATA FLASH>
	wd0: drive supports 1-sector PIO transfers, LBA addressing
	wd0: 489 MB, 994 cyl, 16 head, 63 sec, 512 bytes/sect x 1001952 sectors
	wd0: drive supports PIO mode 4

2) AVL Compact Flash Card 128MB (no model number)

   NetBSD identifies this as follows:

	wdc0 at pcmcia0 function 0 port 0x15000000-0x1500000f
	wd0 at wdc0 channel 0 drive 0: <Ritek Corporation>
	wd0: drive supports 1-sector PIO transfers, LBA addressing
	wd0: 122 MB, 978 cyl, 8 head, 32 sec, 512 bytes/sect x 250368 sectors