NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/41579 (siisata driver interrupt and command completion handling broken)
Hi, I've a little problem with this request ....
OK, I've integrated the driver into the 4.0 kernel, because I need it 
there. (I've still no time to have a look at 5.0.1 ...)
I've fixed the interrupt handling by the patch I've send to NetBSD.org, 
because the driver was instable.
With the patch it seems to be stable now - no problems up to now anymore 
with only disks connected to the controler.
And all three system with this controler type installed are productive 
systems! They run 24 hours 7 days a week without any interruption ...
I've no "free" system with that hardware to install an other OS-version 
on it at the moment.
I have a spare controler of this type, but currently no free MB where I 
can insert it - the controler needs a PCI-X-slot ...
If it's good for you that  I just check the sources, I will do it again 
- see below.
I've done it prior sending the bug report and the changes at that time 
there doesn't affect the problem, but there may be something new now.
And there are "some" major changes ...
But the PR_PC_INCOR bit is again not set in the driver and so one main 
whole is still present where interrupts can be lost.
It is very hard to garantie that there will be no interrupt lost on the 
port, if you clear the cmd-complete interrupt by a side effect!
And as a side effect not setting this bit makes debugging of the 
behaviour here impossible, because you cannot check the PSS register ...
In siisata_intr() you loop now until no bits are set in GR_GIS anymore 
and check the IS in an inner loop for all channels.
Then you check the PIS for the channel and if CMPCMPL is set, you read 
PSS that clears the CMPCMPL interupt as side effect.
This looks OK as long as only one slot is used at all - and I hope this 
performance issue will be addresses in the near future ...
You have moved the whole access to the chip into the intr_port routine - 
was in cmd_complete before.
This looks good, but there is still the "clear all interrupts" part in ...
I would prefere to eliminate this and clear only the interrups gooing to 
serve. This is more stable and of cause required if more than on slot 
will be used in the future.
A check with panic() may make sence if there comes any unexpetec 
interrupt at all - at least if DIGANOSTICS is defined ...
The missing timeout-check in bio_complete is there now.
The usage of PRO_PCS/PRO_PCC has to be corrected.
summary of my analyses:
The main cause why interrupts are lost in the previous version has been 
fixed by moving all chip-access in front of the completion handling.
This problem schould be solved in the latest version of the driver.
The lost timeout should be fixed and the PCS/PCC usage looks good to me.
Neverless I would strongly recommend to set the PR_PC_INCOR bit to 
remove this kind of side effect in interrupt handling.
It is alway a bad style to use HW side effects, because they normaly 
generate sync-wholes and you need to write to PIS in any case - currenly 
still with 0xffffffff without checking for unexpected interrupts - 
another point that should be changed I think. At least a WARNING should 
be printed if there is an unexpected interrupt, so that anybody can see 
that there is still a problem with the driver ...
I still prefere a panic() on unecpected interrupts, but thats my way to 
make drivers stable - do not allow any unexpected behaviour of the HW ...
If you just clear the intterupt, it may come up again and than you may 
run into a busy-loop by processing (or better not processing) the 
HW-interrupt. And you will not see it with the current implementation ...
It is always a good idea for interrupt handlers to clear exactly those 
interrupts only, that they are gooing to serve. Otherwiese some 
interrupts may get lost - OK as long as only one slot is used this 
cannot happen here ...
Sorry that I've no way to run the current version at the moment - 
neverless testing this problem by running the system may only show that 
there is still a problem. It will never say, that there is no one left - 
for this kind of statement, you need to analyse the source code of the 
interrupt handler.
If the request for feedback was intended by a "close-request" for the 
report, the current source code looks good to me and the problem report 
can be closed.
By the way. This mail reached me today at 02:34 (AM) and a reminder for 
that came today at 07:11 (AM).
Is it normal that the systems sends a reminder after less than 5 hours?
W. Stukenbrock
jakllsch%NetBSD.org@localhost wrote:
Synopsis: siisata driver interrupt and command completion handling broken
State-Changed-From-To: open->feedback
State-Changed-By: jakllsch%NetBSD.org@localhost
State-Changed-When: Thu, 20 Aug 2009 00:34:50 +0000
State-Changed-Why:
I believe this has been fixed in -current since early July.
Could you verify that this issue is no longer present in -current?
Home |
Main Index |
Thread Index |
Old Index