tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: mpt(4) timeout recovery improvements



BB> I know we've been corresponding on this for a while
Yes, thaks again for that.

EF> Now, what's the correct way of reset/init the IOC and returning everything 
EF> to scsipi? I guess the correct order is to reset (which leave the IOC in 
EF> the stopped state), then to set xs->error and call scsipi_done(xs) on all 
EF> pending operations and then init the IOC (which empties the request queue).
BB> No, that order won't work.  If you look at the patch, I return all
BB> pending SCSI requests to the driver instance before resetting the IOC.
Yes, I realize that.
My concern with that order (requeue, reset, init) is that the IOC itself 
runs asyncronously to the kernel. So, while the kernel is requeuing the 
requests, the IOC, still running, may serve one of these request.
While, in fact, most SCSI requests should be idempotent (i.e., running them 
a second time doesn't harm), I'm still worried about it.
So, if i reset the IOC, ist stops. I should then have a stable state of 
the request queue to decide what to re-queue.

BB> If you don't return the requests to the SCSI layer before resetting
BB> the card, you'll lose the requests entirely and potentially create
BB> a real mess.
I can't see where I'm losing request on reset (mpt_soft_reset()). I realize 
that initializing the IOC (mpt_init) clears the request queue. What am I 
overlooking?

BB> You can also use XS_REQUEUE
Thanks. Incidently, I just stumbled over this after posting my mail.
My impression is that XS_REQUUE is, in fact, more appropriate than XS_TIMEOUT.

EF> Second question: When repeatedly calling scsipi_done(), can it happen that 
EF> scsipi tries to re-queue these requests before I return? I would then loose 
EF> them when re-initializing the IOC.
BB> No.  The splbio() takes care of that.
I don't understand why. When I call scsipi_done(), I pass control to scsipi, 
no matter what the SPL is. As far as I understand (which may be not too far), 
the only thing preventing scsipi from queueing requests to the card is if the 
channel is frozen?
So, is scsipi_channel_freeze()/thaw() the correct thing to do?

EF> Third question: Do I need to care about xs->xs_callout?
BB> Yes, you do.  If you don't clear the callout when you clear the
BB> requests, you'll find the test I wrote for null pointers triggering in the
BB> mpt_timeout function.
Thanks. I didn't realize that the mpt driver itself set up that callout. 
I was under the impression that the scsipi layer was dealing with it.


Home | Main Index | Thread Index | Old Index