Subject: kern/15841: WDC/ATA PCMCIA and CardBus flash disk unhandled interrupt locks kernel
To: None <gnats-bugs@gnats.netbsd.org>
From: None <tad@entrisphere.com>
List: netbsd-bugs
Date: 03/08/2002 16:32:13
>Number:         15841
>Category:       kern
>Synopsis:       WDC/ATA PCMCIA and CardBus flash disk unhandled interrupt locks kernel
>Confidential:   yes
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Mar 08 16:33:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Tad Hunt
>Release:        NetBSD-1.5 (upgraded to current WDC and ATA drivers) i386 and others
>Organization:
	Entrisphere, Inc.
	http://www.entrisphere.com
>Environment:
	
System: NetBSD ntejava.int.entrisphere.com 1.5 NetBSD 1.5 (ENTRISPHERE) #7: Fri Dec 7 13:32:13 PST 2001 tad@ntejava.int.entrisphere.com:/n/ahab/NetBSD-1.5/src/sys/arch/i386/compile/ENTRISPHERE i386

>Description:

I have been using IBM microdrives in both PCMCIA (on our mpc8260
port) and CardBus slots (on our i386 development machines) without
any trouble for months and months.  They work wonderfully.

Now, we've begun switching to compact flash flash ATA disks (instead
of compact flash spinning ATA disks), because we don't want any
moving parts.

I've tried a couple of different PC Card and Compact flash flash
[sic] disks (listed below).

The trouble I'm having is that I'm receiving a disk interrupt that
the wdc driver is not expecting (WDCF_IRQ_WAIT is not set), which
results in the interrupt never being cleared, thus getting stuck
in an infinite interrupt loop.

I have verified this infinite interrupt loop is without any doubt
the problem that is occurring on the mpc8260 board.  And I am
assuming this is why the NetBSD-i386 host hangs as well (though I
haven't instrumented it to check, the behavior is identical).

The hang is very timing related.  It tends to occur much more
frequently during heavy interrupt activity.  It doesn't seem to be
strongly correlated to disk activity alone.

On our mpc8260 device, we have the PCMCIA controller, and two 8
port UARTs interrupting on the same IRQ.  It seems to take UART
activity coupled with disk activity to cause the interrupt problem.
The disk alone seems to run fine.

As soon as I start heavy uart activity, I tend to see several log
messages per second (see the log() call in the fix below).
Acknowledging the interrupt from the ATA drive is enough to make
the interrupt go away, and elicits no complaints from the ATA or
WDC drivers.

The NetBSD-i386 box tends to hang shortly after starting a "dd
if=/dev/r${disk}${rawpart} of=/dev/null bs=512".  However, I don't
know much about how it is configured or what else may be sharing
the disk interrupt.

Obviously, the above hack is quite a hack, but I have spent several
days tracking this problem down and am at a loss for how to proceed.

I tried updating the WDC and ATA drivers to NetBSD-current (as of
last night), but the problem did not go away.

Has anyone seen anything like this before, or have some suggestions on
what I could do to narrow the problem down?

-Tad

------------------------------------------------------------------------------


1) SMART Modular Technologies ATA PC Card (model SM9FLAPC512M1)

   NetBSD identifies this as follows:

	wdc0 at pcmcia0 function 0 port 0x15000000-0x1500000f
	wd0 at wdc0 channel 0 drive 0: <SMART ATA FLASH>
	wd0: drive supports 1-sector PIO transfers, LBA addressing
	wd0: 489 MB, 994 cyl, 16 head, 63 sec, 512 bytes/sect x 1001952 sectors
	wd0: drive supports PIO mode 4

2) AVL Compact Flash Card 128MB (no model number)

   NetBSD identifies this as follows:

	wdc0 at pcmcia0 function 0 port 0x15000000-0x1500000f
	wd0 at wdc0 channel 0 drive 0: <Ritek Corporation>
	wd0: drive supports 1-sector PIO transfers, LBA addressing
	wd0: 122 MB, 978 cyl, 8 head, 32 sec, 512 bytes/sect x 250368 sectors
	

>How-To-Repeat:

1) start a dd if=/dev/r${disk}${rawpart} of=/dev/null bs=512

2) let this run long enough to stabilize:
	If I don't move onto step 3, this tends to run without
	the log messages appearing.

3) cause non-disk interrupts to occur on the same IRQ.
	On the mpc8260 board, I start a file transfer on one of
	the uart channels.

	This causes unexpected ATA interrupts quite frequently
	(averages about once per second)

>Fix:

The below hack is enough to acknowledge the disk interrupt, so we
can continue happily along.  It doesn't appear to cause any data
lossage, nor does the wdc driver get confused.

This fix is by no means a general solution, because it forces the
device-independent interrupt handling code to be forced to know
about ATA controllers.

------------------------------------------------------------------------------
	intrhandler()
	{
		handled = 0;
		for(ih = handlers_for_this_irq; ih != nil; ih = ih->next) {
			handled += ih->handle(ih->arg);
			if(handled)
				break;
		}
		if(!handled && irq == WDC_DISK_IRQ) {
			ih = find_ata_disk_ih();
			wdcfix(ih->arg);
		}
	}

------------------------------------------------------------------------------

	void
	wdcfix(void *arg)
	{
		struct channel_softc *chp;

		chp = arg;

		if(chp->ch_flags & WDCF_IRQ_WAIT)
			return;

		wdcwait(chp, 0, 0, 0);
		log(LOG_ERR, "%s: read status %02x\n",
			chp->wdc->sc_dev.dv_xname, chp->ch_status);
	}
------------------------------------------------------------------------------

>Release-Note:
>Audit-Trail:
>Unformatted: