tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Questions about pci configuration and the mpt(4) driver

        hello.  I'm sure the problem is caused by a hardware error or firmware
bug.  However, when one drive fails, the entire controller currently goes
to lunch, or it can.  The FreeBSD driver deals with this by keeping a kernel
thread going which watches for commands which have been put into the expired
queue by the cam layer and re-runs them through the IOC.  I'm playing with
a patch I've cooked up which tries to drain the IOC queues when a
timeout occurs to see if that makes things better.  Right now, if a timeout
occurs, our driver seems to do the right thing at the scsipi layer, but
does nothing to make the IOC happier and, eventually, the IOC runs out of
queue buffers and everything comes to a grinding halt.
If this works, or a variation of it, then I may not need to learn how to
re-initialize the controller in flight.


On Nov 17, 11:34am, David Young wrote:
} Subject: Re: Questions about pci configuration and the mpt(4) driver
} On Sat, Nov 17, 2012 at 02:18:18AM -0800, Brian Buhrow wrote:
} >     Hello.  I've been working on an issue with the mpt(4) driver, the
} > driver for the LSI Fusion SCSI controller cards and raid cards.  In the
} > process of working through the issue, I've discovered that the mpt(4)
} > driver is very fragile if the need to reset the hardware arises.  In
} > particular, if a hardware reset is done, all of the pci configuration
} > registers get zorched, causing interrupt handling to fail and requests to
} > get stuck in the driver and hardware's queue.
} When does the need for a reset arise?  Is the cause a driver bug or a
} hardware/firmware bug?
} It sounds to me like you should detach the device (perhaps resetting
} in the final stages of the detachment---i.e., before unmapping the
} registers) and re-attach it.
} Since detachment ordinarily loses all of the software state, you may
} need to stash the outstanding requests somewhere that the re-attached
} device can find them.
} Dave
} > I've been looking for
} > examples of how to reset the PCI registers after such a reset, but neither
} > the OpenBSD or FreeBSD drivers offer a clue.  All BSD drivers I've looked
} > at lament the problem, but none provide a solution.  I've considered
} > extracting the PCI initialization process from the mpt_pci_attach() routine
} > into a separate function that can be called at any time while things are
} > running, but there must be a reason this hasn't been done already and why I
} > don't see any examples that look obvious to me of any drivers that do this.
} > Is it safe to call pci_intr_disestablish() and pci_intr_establish() during
} > the course of normal multi-user operation for a particular driver as a
} > means of re-attaching interrupts to a device that's forgotten how to
} > generate them?  Are there any examples of drivers that do a complete reset
} > of the hardware, including pci and pci interrupt settings while continuing
} > to operate in multi-user mode?
} > -thanks
} > -Brian
} > 
} -- 
} David Young
}    Urbana, IL    (217) 721-9981
>-- End of excerpt from David Young

Home | Main Index | Thread Index | Old Index