Subject: Re: kern/35071: panic: mpt_get_request: corrupted request free list (xfer)
To: Tracy Di Marco White <tjd-nb-pr@menelos.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-bugs
Date: 12/03/2006 12:07:34
On Sun, Dec 03, 2006 at 04:34:18AM -0600, Tracy Di Marco White wrote:
> 
> In message <20061202185501.GA16429@antioche.eu.org>, Manuel Bouyer writes:
> >OK, the command resets, and later the chip says it's complete while
> >we've already freed it. I think we should just issue a bus reset
> >(or bus_device_reset but it's harder to do) in case of timeout, and
> >let the controller complete the commands.
> >
> >Attached is a patch that attemps to implement a bus_reset function for
> >mpt(4). You can easily test by starting some I/O (e.g dd if=/dev/rsdxd
> >of=/dev/null bs=1m) and while it's running issue several scsictl scsibusx reset
> >
> >I expect to see "IOC Bus Reset Port %d" or "External Bus Reset" on console
> 
> I occasionally get this:
> probe(mpt2:0:0:0): command timeout
> mpt2: timeout on request index = 0xfe, seq = 0x00000068
> mpt2: Status 0x80000000, Mask 0x00000001, Doorbell 0x24000000
> mpt2: request state: On Chip
> 
> over and over at boot, on different controllers.
> Now, instead, it seems to hang here instead of repeating.
> When I get this I need to reboot anyway until I don't get it,
> as usually whatever is on the scsi chain complaining will not
> be found.

So when we issue a bus reset the IOC doens't abort pending commands that
it has in its queue. It's hard to understand how such rarely-used feature
works by reverse-engineering other drivers; I'm not even sure it works
properly in other drivers ...

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--