Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/sys/dev/scsipi



On Fri, Apr 20, 2012 at 06:06:35PM +1000, matthew green wrote:
> 
> > > we've never had autoconfig run with the kernel lock AFAICT, so this
> > > assumption has never been true.
> > 
> > So this is a bug. The contract was really that spl-locked drivers would
> > continue to work as is when fine-grained locking was introduced.
> 
> there are no actual bugs or races here.  this is about cleanup
> and ensuring code runs how it expects.

And that's the point: all code that use spl-style locking expects
to be called with KERNEL_LOCK held. And this includes the attach
routine.

> 
> > I don't think it's reasonable to require every (but the few
> > already MP-safe) drivers to take KERNEL_LOCK in their attach
> > routine.
> 
> i'm not saying that they should.  i'm saying that scsipi drivers,
> regardless of whether they are fully MPSAFE or not, should now
> arrange to have the kernel lock held when calling into scsipi at
> all times.  this really has nothing to do with attach, except that
> attach is problem to fix currently in some drivers.

Almost all scsipi drivers ...

> 
> > It's not only IPL_VM drivers, it'a all driver wich don't register
> > and interrupt handler explicitely marked MPSAFE (for example,
> > with PCI_INTR_MPSAFE for PCI drivers).
> 
> interesting; i hadn't seen PCI_INTR_MPSAFE before.  however, it
> seems ignored / unused.  the x86 pci_intr_setattr() does do
> something but it's never called.  none of the rest do anything.

this just means there's no PCI MP-SAFE driver yet ?

> 
> my point is that drivers get interrupt protection from asking for
> an IPL_VM interrupt.  that's what tells the MD code to make sure
> the kernel lock is held for these interrupt handlers.

Or explicitely declared MPSAFE. at last x86 allows this.

> 
> > > > > these KASSERT()s have been in place for a long time now and, until
> > > > > your change, we've fixed it at the caller level not worked around it
> > > > > at the scsipi level.
> > > > 
> > > > AFAIK the only drivers that has been changed are ncr53c9x and USB, and 
> > > > they
> > > > are MP-safe (they already use mutex instead of spl locking).
> > > 
> > > usb is not MP safe in -current, only on the usbmp branch.  all those
> > > drivers take care of taking the kernel lock when calling into the
> > > scsipi code since they almost *always* run when not cold anyway.
> > 
> > So things were fixed at the wrong place. Any non-MPSAFE driver is at
> > risk right now
> 
> i don't understand.  eg, the usbmp code never calls into scsipi
> during autoconfig.

usb is special in this regard.

> these asserts aren't specific to autoconfig,
> and IIRC, most drivers do not call into scsipi during attach.

Most if not all regular scsi drivers do: they do a config_found()
in their attach routine to attach the scsibus.
ATA does it from a kernel thread.

> 
> > > > > your change can't work with the drivers that are broken without the
> > > > > major regression of holding kernel lock over autoconfig, or, never
> > > > > ever having them be modules or detached/reattached.
> > > > 
> > > > Maybe non-MPSAFE drivers can't be modules or detached/reattached, 
> > > > indeed.
> > > 
> > > that's not true.  eg, radeondrm works just fine being MP safe or not.
> > > (it actually is MP safe, though it doesn't announce this to any of the
> > > rest of the system currently.)
> > 
> > Yes, of course, unless there's a KASSERT, things will run fine most of
> > the time. That's the problem with race conditions: it doesn't reliably
> > fail. 
> 
> that's not what i said.  what i said is that it is very trivial to have
> a non MPSAFE-driver handle being a module, being detached, and being
> reattached.

The discussion is about KERNEL_LOCK, and possible race of running
non-MPSAFE code without it.


> > 
> > No, it's not specific to scsipi. It happens that scsipi asserts that
> > KERNEL_LOCK is held, but any non-MPSAFE drivers needs to be called with
> > KERNEL_LOCK held.
> 
> why?  we've never done this for several years now and there's no reason
> why we think there is a problem with it currently.

why did you add this KASSERT to scsipi then ?

> we've actually run
> the interrupt part of autoconfig with multiple threads since sometime
> between netbsd 4.0 and 5.0.

this doesn't mean the races don't exists, just that we don't run into
them most of the time. Maybe that in some common case there's no possible
race, but that would be by accident, not by design.

> 
> > > as in aside, you should make "drvctl -d" work for any drivers
> > > you maintain.  everyone (tm) should.  irrespective of modules
> > > or otherwise, the end result of it working tends to be bug
> > > fixes.
> > 
> > It's hard to drvctl -d the root device ...
> 
> eh, pxe is your friend.  :-)  

I can pxe boot the kernel; having root on NFS is another things.

> 
> > > > If we want this to work for non-MPSAFE driver, then this has to be fixed
> > > > in autoconf (if the problem really exists, which I've not looked at 
> > > > yet).
> > > 
> > > i don't believe there is a problem in autoconfig.  there is a
> > > problem with some (probably most) scsi drivers here, but that
> > > is about it.  it is true that the contracts have changed, but
> > > this is progress after all.  the assert exists solely to find
> > > places that are wrong and fix them.  your change hides this.
> > 
> > the contract has changed in a way that exposes all non-MPSAFE driver
> > to a race condition. The problem shows up in scsipi, but it's not the
> > only place where the problem exists. ata is not MP-safe, so the
> > problem is also in IDE and SATA drivers (including some USB drivers).
> > mii(4) and ifmedia is not MP-safe, so ethernet drivers have this problem
> > too.
> > 
> > I've not audited the whole code but there's probably other examples.
> 
> could be.  we should fix them.  the only way we can find them is
> if we add things like KASSERT()s to the relevant places and then go
> fix the callers.  that's what i'm asking you to do here.

Sorry, but I won't fix all ATA, SCSI and ethernet drivers. I'll go
fix autoconfig instead.

> 
> > > this isn't about SMPifing a whole driver, but adding a call to
> > > taken/release the kernel lock around a few (often just one)
> > > calls in a driver.  it has never been a difficult change to
> > > make.  if you have drivers that needs it, i am happy to send
> > > patches for testing.
> > 
> > *ALL* non-MPSAFE drivers needs it. 
> 
> how can all drivers need scsipi specific code?  *most* of the
> scsipi drivers do not actually call into scsipi in the attach
> routine, from what i recall looking a few months ago.

most of them do config_found() in their attach routine to attach
their scsibuses (but then we're back to autoconfig: scsipi not being
MPSAFE, autoconfig should not call its attach routine without KERNEL_LOCK).

> 
> what i'm asking you for is a list of actual problems you've seen
> with the KASSERT() you changed so i can fix those, and revert
> your change.  please help me advance the state of netbsd smp.

I think you have the list: all scsi drivers.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index