Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/sys/dev/scsipi



On Fri, Apr 20, 2012 at 10:03:09AM +1000, matthew green wrote:
> 
> > > scsipi depends upon kernel lock.  thus, callers should arrange for it
> > > to be held.  since these are drivers calling, it is upto each driver
> > > to do this currently.  this is what we've done with other problems we
> > > have hit as they've arrived.
> > 
> > if the caller is MPSAFE, I agree. Non-MPSAFE driver should not have to care
> > about KERNEL_LOCK (precisely because they are not MPSAFE).
> > The basic assumption about kernel locking was that that driver and 
> > subsystems
> > using spl locking would continue to work as it, without changes. If we
> > change this assumption a lot of code needs to be changed (that is, almost
> > all our drivers).
> 
> we've never had autoconfig run with the kernel lock AFAICT, so this
> assumption has never been true.

So this is a bug. The contract was really that spl-locked drivers would
continue to work as is when fine-grained locking was introduced.
I don't think it's reasonable to require every (but the few
already MP-safe) drivers to take KERNEL_LOCK in their attach
routine.

> the thing that does work is that
> IPL_VM using drivers will automatically get kernel locked in their
> interrupt handlers.

It's not only IPL_VM drivers, it'a all driver wich don't register
and interrupt handler explicitely marked MPSAFE (for example,
with PCI_INTR_MPSAFE for PCI drivers).

> 
> > > these KASSERT()s have been in place for a long time now and, until
> > > your change, we've fixed it at the caller level not worked around it
> > > at the scsipi level.
> > 
> > AFAIK the only drivers that has been changed are ncr53c9x and USB, and they
> > are MP-safe (they already use mutex instead of spl locking).
> 
> usb is not MP safe in -current, only on the usbmp branch.  all those
> drivers take care of taking the kernel lock when calling into the
> scsipi code since they almost *always* run when not cold anyway.

So things were fixed at the wrong place. Any non-MPSAFE driver is at
risk right now

> infact, the asserts were added in -current due to the usbmp work
> and the understanding that they would advance the rest of the system.
> 
> > > your change can't work with the drivers that are broken without the
> > > major regression of holding kernel lock over autoconfig, or, never
> > > ever having them be modules or detached/reattached.
> > 
> > Maybe non-MPSAFE drivers can't be modules or detached/reattached, indeed.
> 
> that's not true.  eg, radeondrm works just fine being MP safe or not.
> (it actually is MP safe, though it doesn't announce this to any of the
> rest of the system currently.)

Yes, of course, unless there's a KASSERT, things will run fine most of
the time. That's the problem with race conditions: it doesn't reliably
fail. 

> 
> > But that's a problem with autoconf not dealing with non-MPSAFE drivers,
> > not the driver themselve (they were working before the MP changes, they
> > should continue to work as-is). If you want autoconf to not run
> > under the KERNEL_LOCK when not needed, then is has to be extended so that
> > MPSAFE drivers can register themselves as MPSAFE, so that non-MPSAFE
> > drivers are still called with KERNEL_LOCK.
> 
> it seems the wrong answer to me.  instead of fixing the one or
> two drivers you've found problems in, you want to add more
> infrastructure (IMO clutter) to the system and patch all the
> drivers that *are* ok.  considering that this problem is
> very specific to scsipi, i don't think that is a viable
> solution.

No, it's not specific to scsipi. It happens that scsipi asserts that
KERNEL_LOCK is held, but any non-MPSAFE drivers needs to be called with
KERNEL_LOCK held.

> 
> > > please make "drvctl -d" and rescan work for your drivers.  this will
> > > make the checks against "cold" go away.
> > 
> > I don't want to touch each driver in our tree for drvctl -d/rescan to work.
> 
> as in aside, you should make "drvctl -d" work for any drivers
> you maintain.  everyone (tm) should.  irrespective of modules
> or otherwise, the end result of it working tends to be bug
> fixes.

It's hard to drvctl -d the root device ...

> 
> > If we want this to work for non-MPSAFE driver, then this has to be fixed
> > in autoconf (if the problem really exists, which I've not looked at yet).
> 
> i don't believe there is a problem in autoconfig.  there is a
> problem with some (probably most) scsi drivers here, but that
> is about it.  it is true that the contracts have changed, but
> this is progress after all.  the assert exists solely to find
> places that are wrong and fix them.  your change hides this.

the contract has changed in a way that exposes all non-MPSAFE driver
to a race condition. The problem shows up in scsipi, but it's not the
only place where the problem exists. ata is not MP-safe, so the
problem is also in IDE and SATA drivers (including some USB drivers).
mii(4) and ifmedia is not MP-safe, so ethernet drivers have this problem
too.

I've not audited the whole code but there's probably other examples.

> 
> this isn't about SMPifing a whole driver, but adding a call to
> taken/release the kernel lock around a few (often just one)
> calls in a driver.  it has never been a difficult change to
> make.  if you have drivers that needs it, i am happy to send
> patches for testing.

*ALL* non-MPSAFE drivers needs it. 

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index