[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/41579 (siisata driver interrupt and command completion handling broken)
Hi, I've a little problem with this request ....
OK, I've integrated the driver into the 4.0 kernel, because I need it
there. (I've still no time to have a look at 5.0.1 ...)
I've fixed the interrupt handling by the patch I've send to NetBSD.org,
because the driver was instable.
With the patch it seems to be stable now - no problems up to now anymore
with only disks connected to the controler.
And all three system with this controler type installed are productive
systems! They run 24 hours 7 days a week without any interruption ...
I've no "free" system with that hardware to install an other OS-version
on it at the moment.
I have a spare controler of this type, but currently no free MB where I
can insert it - the controler needs a PCI-X-slot ...
If it's good for you that I just check the sources, I will do it again
- see below.
I've done it prior sending the bug report and the changes at that time
there doesn't affect the problem, but there may be something new now.
And there are "some" major changes ...
But the PR_PC_INCOR bit is again not set in the driver and so one main
whole is still present where interrupts can be lost.
It is very hard to garantie that there will be no interrupt lost on the
port, if you clear the cmd-complete interrupt by a side effect!
And as a side effect not setting this bit makes debugging of the
behaviour here impossible, because you cannot check the PSS register ...
In siisata_intr() you loop now until no bits are set in GR_GIS anymore
and check the IS in an inner loop for all channels.
Then you check the PIS for the channel and if CMPCMPL is set, you read
PSS that clears the CMPCMPL interupt as side effect.
This looks OK as long as only one slot is used at all - and I hope this
performance issue will be addresses in the near future ...
You have moved the whole access to the chip into the intr_port routine -
was in cmd_complete before.
This looks good, but there is still the "clear all interrupts" part in ...
I would prefere to eliminate this and clear only the interrups gooing to
serve. This is more stable and of cause required if more than on slot
will be used in the future.
A check with panic() may make sence if there comes any unexpetec
interrupt at all - at least if DIGANOSTICS is defined ...
The missing timeout-check in bio_complete is there now.
The usage of PRO_PCS/PRO_PCC has to be corrected.
summary of my analyses:
The main cause why interrupts are lost in the previous version has been
fixed by moving all chip-access in front of the completion handling.
This problem schould be solved in the latest version of the driver.
The lost timeout should be fixed and the PCS/PCC usage looks good to me.
Neverless I would strongly recommend to set the PR_PC_INCOR bit to
remove this kind of side effect in interrupt handling.
It is alway a bad style to use HW side effects, because they normaly
generate sync-wholes and you need to write to PIS in any case - currenly
still with 0xffffffff without checking for unexpected interrupts -
another point that should be changed I think. At least a WARNING should
be printed if there is an unexpected interrupt, so that anybody can see
that there is still a problem with the driver ...
I still prefere a panic() on unecpected interrupts, but thats my way to
make drivers stable - do not allow any unexpected behaviour of the HW ...
If you just clear the intterupt, it may come up again and than you may
run into a busy-loop by processing (or better not processing) the
HW-interrupt. And you will not see it with the current implementation ...
It is always a good idea for interrupt handlers to clear exactly those
interrupts only, that they are gooing to serve. Otherwiese some
interrupts may get lost - OK as long as only one slot is used this
cannot happen here ...
Sorry that I've no way to run the current version at the moment -
neverless testing this problem by running the system may only show that
there is still a problem. It will never say, that there is no one left -
for this kind of statement, you need to analyse the source code of the
If the request for feedback was intended by a "close-request" for the
report, the current source code looks good to me and the problem report
can be closed.
By the way. This mail reached me today at 02:34 (AM) and a reminder for
that came today at 07:11 (AM).
Is it normal that the systems sends a reminder after less than 5 hours?
Synopsis: siisata driver interrupt and command completion handling broken
State-Changed-When: Thu, 20 Aug 2009 00:34:50 +0000
I believe this has been fixed in -current since early July.
Could you verify that this issue is no longer present in -current?
Main Index |
Thread Index |