tech-kern: Re: raidstrategy() isn't interrupt-safe

Subject: Re: raidstrategy() isn't interrupt-safe
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Greg Oster <oster@cs.usask.ca>
List: tech-kern
Date: 05/14/2005 11:09:42

Manuel Bouyer writes:
> Hi,
> a recent report on the port-xen list shows this stack trace:
> panic: pool_get: rf_rad_pl: must have NOWAIT
> Stopped at      netbsd:cpu_Debugger+0x4        leave
> cpu_Debugger(2ff,c9f31,2fe,401,c0657c4c) at netbsd:cpu_Debugger+0x4
> panic(c05bb5c0,c0569d73,c0720d50,c04cc336,c09a6700) at netbsd:panic+0x121
> pool_get(c0657c4c,2,c0e26000,10000,0) at netbsd:pool_get+0xf9
> rf_AllocRaidAccDesc(c0aa2000,72,401,0,1) at netbsd:rf_AllocRaidAccDesc+0x30
> rf_DoAccess(c0aa200,72,1,401,0) at netbsd:rf_DoAccess+0x40
> raidstart(c0aa2000,c0b07000,0,0,1) at netbsd:raidstart+0x28f
> raidstrategy(c0b07000,1,c0a4c118,c0acc800,c09d1084) at
> netbsd:raidstrategy+0x136
> ccdstart(c0a6d000,c09cd000,0,c0a4c118,c0a4c118) at netbsd:ccdstart+0x137
> ccdstrategy(c09cd000,1,1,c09cd07c,0) at netbsd:ccdstrategy+0x142
> xbdback_do_io(c09cd000,c09cd000,c0720dc0,c0212b42,c0a4c118) at
> 
> raidstrategy() ends up being called from interrupt context here,
> and ends up calling pool_get(PR_WAITOK).
> Quoting a previous post from Jason Thorpe on tech-kern:
> > There are lots of other things that might cause a disk's strategy  
> > routine to be called from interrupt context (ccd / raidframe are good  
> > examples).  Really, we need to audit ALL of the disk strategy  
> > routines and ensure that they are IPL_BIO interrupt-context safe.
> 
> so raidframe is broken in this respect. Should I fill a PR about this ?

Please.  If nothing else, as a placeholder/reminder.

I'm not sure what a fix for this will look like.. it might require 
decoupling the "queuing bits" from the "doing bits" in raidstrategy().
In any event, there is far too much stuff that gets done in 
raidstrategy() for it to be called from interrupt-land...

I'm certainly open to suggestions here...

Later...

Greg Oster