Re: kern/41579 (siisata driver interrupt and command completion handling broken)

To: jakllsch%NetBSD.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, Wolfgang.Stukenbrock%nagler-company.com@localhost
Subject: Re: kern/41579 (siisata driver interrupt and command completion handling broken)
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock%nagler-company.com@localhost>
Date: Thu, 20 Aug 2009 10:20:05 +0000 (UTC)

The following reply was made to PR kern/41579; it has been noted by GNATS.

From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock%nagler-company.com@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: jakllsch%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost, 
gnats-admin%NetBSD.org@localhost
Subject: Re: kern/41579 (siisata driver interrupt and command completion 
handling broken)
Date: Thu, 20 Aug 2009 12:16:15 +0200

 Hi, I've a little problem with this request ....

 OK, I've integrated the driver into the 4.0 kernel, because I need it 
 there. (I've still no time to have a look at 5.0.1 ...)
 I've fixed the interrupt handling by the patch I've send to NetBSD.org, 
 because the driver was instable.
 With the patch it seems to be stable now - no problems up to now anymore 
 with only disks connected to the controler.
 And all three system with this controler type installed are productive 
 systems! They run 24 hours 7 days a week without any interruption ...

 I've no "free" system with that hardware to install an other OS-version 
 on it at the moment.
 I have a spare controler of this type, but currently no free MB where I 
 can insert it - the controler needs a PCI-X-slot ...

 If it's good for you that  I just check the sources, I will do it again 
 - see below.
 I've done it prior sending the bug report and the changes at that time 
 there doesn't affect the problem, but there may be something new now.

 And there are "some" major changes ...

 But the PR_PC_INCOR bit is again not set in the driver and so one main 
 whole is still present where interrupts can be lost.
 It is very hard to garantie that there will be no interrupt lost on the 
 port, if you clear the cmd-complete interrupt by a side effect!
 And as a side effect not setting this bit makes debugging of the 
 behaviour here impossible, because you cannot check the PSS register ...

 In siisata_intr() you loop now until no bits are set in GR_GIS anymore 
 and check the IS in an inner loop for all channels.
 Then you check the PIS for the channel and if CMPCMPL is set, you read 
 PSS that clears the CMPCMPL interupt as side effect.
 This looks OK as long as only one slot is used at all - and I hope this 
 performance issue will be addresses in the near future ...
 You have moved the whole access to the chip into the intr_port routine - 
 was in cmd_complete before.
 This looks good, but there is still the "clear all interrupts" part in ...
 I would prefere to eliminate this and clear only the interrups gooing to 
 serve. This is more stable and of cause required if more than on slot 
 will be used in the future.
 A check with panic() may make sence if there comes any unexpetec 
 interrupt at all - at least if DIGANOSTICS is defined ...

 The missing timeout-check in bio_complete is there now.
 The usage of PRO_PCS/PRO_PCC has to be corrected.

 summary of my analyses:
 The main cause why interrupts are lost in the previous version has been 
 fixed by moving all chip-access in front of the completion handling.
 This problem schould be solved in the latest version of the driver.
 The lost timeout should be fixed and the PCS/PCC usage looks good to me.

 Neverless I would strongly recommend to set the PR_PC_INCOR bit to 
 remove this kind of side effect in interrupt handling.
 It is alway a bad style to use HW side effects, because they normaly 
 generate sync-wholes and you need to write to PIS in any case - currenly 
 still with 0xffffffff without checking for unexpected interrupts - 
 another point that should be changed I think. At least a WARNING should 
 be printed if there is an unexpected interrupt, so that anybody can see 
 that there is still a problem with the driver ...
 I still prefere a panic() on unecpected interrupts, but thats my way to 
 make drivers stable - do not allow any unexpected behaviour of the HW ...
 If you just clear the intterupt, it may come up again and than you may 
 run into a busy-loop by processing (or better not processing) the 
 HW-interrupt. And you will not see it with the current implementation ...
 It is always a good idea for interrupt handlers to clear exactly those 
 interrupts only, that they are gooing to serve. Otherwiese some 
 interrupts may get lost - OK as long as only one slot is used this 
 cannot happen here ...

 Sorry that I've no way to run the current version at the moment - 
 neverless testing this problem by running the system may only show that 
 there is still a problem. It will never say, that there is no one left - 
 for this kind of statement, you need to analyse the source code of the 
 interrupt handler.
 If the request for feedback was intended by a "close-request" for the 
 report, the current source code looks good to me and the problem report 
 can be closed.

 By the way. This mail reached me today at 02:34 (AM) and a reminder for 
 that came today at 07:11 (AM).
 Is it normal that the systems sends a reminder after less than 5 hours?

 W. Stukenbrock

 jakllsch%NetBSD.org@localhost wrote:

 >Synopsis: siisata driver interrupt and command completion handling broken
 >
 >State-Changed-From-To: open->feedback
 >State-Changed-By: jakllsch%NetBSD.org@localhost
 >State-Changed-When: Thu, 20 Aug 2009 00:34:50 +0000
 >State-Changed-Why:
 >I believe this has been fixed in -current since early July.
 >Could you verify that this issue is no longer present in -current?
 >
 >
 >

Prev by Date: Re: kern/41579 (siisata driver interrupt and command completion handling broken)
Next by Date: Re: kern/41905: System Crash
Previous by Thread: Re: kern/41579 (siisata driver interrupt and command completion handling broken)
Next by Thread: Re: kern/41579 (siisata driver interrupt and command completion handling broken)
Indexes:

Home | Main Index | Thread Index | Old Index