Subject: Re: Problems with Promise PDC40718 SATA 300 controller card under NetBSD-3.0
To: None <current-users@netbsd.org>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: current-users
Date: 04/28/2006 17:27:32
Hello. More investigation reveals that the problem appears to be that
if the driver thinks a drive timed out, and it resets that drive's channel,
then interrupts for that drive are never delivered again, and that drive
and channel become unresponsive. I've been trying to understand what the
FreeBSD code does, but I'm having some trouble getting my head around it.
In any case, it looks like our pdc205xx_do_reset() routine may not be
getting things back to a working state. Or, perhaps, it's resetting the
card, but that's disabling interrupts on it for some reason.
So, the order things break now is:
1. All's working fine.
2. We get a lost interrupt message from the wdc.c code, followed by a
timeout reading or writing fsbn from one of the drives on the indicated
controller.
3. We then get a few more lost interrupt messages, each accompanied by a
device time out reading or writing some block number.
4. After about for or so of these, we get a reset failed message on the
indicated channel, drive 0, followed by a wait timed out message, followed
by another device timed out reading or writing message.
5. Finaly, we get stuck in a loop where we gget reset channel failed for
drive 0 messages, alternating with device time outs for specific block
numbers on the afflicted disk. This state persists until reboot.
So, I think we have two problems. 1. There's a timing bug where
we're either missing an interrupt for a drive, or the cards are
occasionally dropping interrupts.
2. When this condition is encountered, the measures we take to remedy the
situation, i.e. reset the drive or the channel on the card, don't help,
and, in fact, they seem to make things worse.
Does anyone have suggestions about where I might read about how the
Promise sata300 cards are supposed to work?
-thanks
-Brian