NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/41579 (siisata driver interrupt and command completion handling broken)

Hi, I've a little problem with this request ....

OK, I've integrated the driver into the 4.0 kernel, because I need it there. (I've still no time to have a look at 5.0.1 ...) I've fixed the interrupt handling by the patch I've send to, because the driver was instable. With the patch it seems to be stable now - no problems up to now anymore with only disks connected to the controler. And all three system with this controler type installed are productive systems! They run 24 hours 7 days a week without any interruption ...

I've no "free" system with that hardware to install an other OS-version on it at the moment. I have a spare controler of this type, but currently no free MB where I can insert it - the controler needs a PCI-X-slot ...

If it's good for you that I just check the sources, I will do it again - see below. I've done it prior sending the bug report and the changes at that time there doesn't affect the problem, but there may be something new now.

And there are "some" major changes ...

But the PR_PC_INCOR bit is again not set in the driver and so one main whole is still present where interrupts can be lost. It is very hard to garantie that there will be no interrupt lost on the port, if you clear the cmd-complete interrupt by a side effect! And as a side effect not setting this bit makes debugging of the behaviour here impossible, because you cannot check the PSS register ...

In siisata_intr() you loop now until no bits are set in GR_GIS anymore and check the IS in an inner loop for all channels. Then you check the PIS for the channel and if CMPCMPL is set, you read PSS that clears the CMPCMPL interupt as side effect. This looks OK as long as only one slot is used at all - and I hope this performance issue will be addresses in the near future ... You have moved the whole access to the chip into the intr_port routine - was in cmd_complete before.
This looks good, but there is still the "clear all interrupts" part in ...
I would prefere to eliminate this and clear only the interrups gooing to serve. This is more stable and of cause required if more than on slot will be used in the future. A check with panic() may make sence if there comes any unexpetec interrupt at all - at least if DIGANOSTICS is defined ...

The missing timeout-check in bio_complete is there now.
The usage of PRO_PCS/PRO_PCC has to be corrected.

summary of my analyses:
The main cause why interrupts are lost in the previous version has been fixed by moving all chip-access in front of the completion handling.
This problem schould be solved in the latest version of the driver.
The lost timeout should be fixed and the PCS/PCC usage looks good to me.

Neverless I would strongly recommend to set the PR_PC_INCOR bit to remove this kind of side effect in interrupt handling. It is alway a bad style to use HW side effects, because they normaly generate sync-wholes and you need to write to PIS in any case - currenly still with 0xffffffff without checking for unexpected interrupts - another point that should be changed I think. At least a WARNING should be printed if there is an unexpected interrupt, so that anybody can see that there is still a problem with the driver ... I still prefere a panic() on unecpected interrupts, but thats my way to make drivers stable - do not allow any unexpected behaviour of the HW ... If you just clear the intterupt, it may come up again and than you may run into a busy-loop by processing (or better not processing) the HW-interrupt. And you will not see it with the current implementation ... It is always a good idea for interrupt handlers to clear exactly those interrupts only, that they are gooing to serve. Otherwiese some interrupts may get lost - OK as long as only one slot is used this cannot happen here ...

Sorry that I've no way to run the current version at the moment - neverless testing this problem by running the system may only show that there is still a problem. It will never say, that there is no one left - for this kind of statement, you need to analyse the source code of the interrupt handler. If the request for feedback was intended by a "close-request" for the report, the current source code looks good to me and the problem report can be closed.

By the way. This mail reached me today at 02:34 (AM) and a reminder for that came today at 07:11 (AM).
Is it normal that the systems sends a reminder after less than 5 hours?

W. Stukenbrock wrote:

Synopsis: siisata driver interrupt and command completion handling broken

State-Changed-From-To: open->feedback
State-Changed-When: Thu, 20 Aug 2009 00:34:50 +0000
I believe this has been fixed in -current since early July.
Could you verify that this issue is no longer present in -current?

Home | Main Index | Thread Index | Old Index