Subject: Re: Is there any way to tell if the current thread has a user context?
To: Matthew Orgass <darkstar@city-net.com>
From: Alan Ritter <ritter.alan@gmail.com>
List: tech-kern
Date: 09/12/2005 19:21:05
--nextPart1268741.HFOXuQuMRg
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Monday 12 September 2005 6:53 am, Matthew Orgass wrote:
> On 2005-09-09 ritter.alan@gmail.com wrote:
> > OK, I added macros that raise the IPL, then grab a simple lock
> > everyplace there is synchronization required between the top and bottom
> > halves
>
>   Pass the variable name to the macros and use do { ... } while (0)
> around multiple statements in macros (so that if (...) MACRO; works as
> expected).

Thanks, I did this.

> > There were a lot of places I was using lockmgr locks that I removed,
> > and the driver seems to work just the same (maybe a bit faster).  I
> > originally just replaced the freebsd mtx_lock(&lock) with
> > lockmgr(&lock, LK_EXCLUSIVE, NULL), and I tried removing this entirely.
> >  This seemed to cause problems however, so I put it back.  There are a
> > few places in the FreeBSD code where they do mtx_lock(&Giant), so I'm
> > thinking I might need to use lockmgr locks there.
>
>   The other cases where lockmgr is used is where code being called
> expects to be able to sleep without releasing the lock (vfs does this
> with vnode locks) or acquire the lock recursively.  Both should be
> avoided if possible.  In particular, I think it is a good idea to
> separate
> refcounting where you only care that the object continues to exist from
> locking where you want to access a consistant version of the object.
> Both of these are necessary concerns even on SP machines to avoid
> unexpected variable modification, invalid references, and memory leakage.
> It is good to specify what the lock protects if not the entire data
> structure that holds it.

As far as I know there isn't anyplace it goes to sleep while holding a lock=
=2E =20
I'll look through the code to see if I can find anything like this though.

>   I would guess the Giant lock is equivalent to NetBSD's KERNEL_LOCK.  My
> understanding is that you shouldn't have to worry about that right now
> since currently only one CPU is allowed to be in the kernel at any time
> except for a few special cases (most drivers do not yet do locking).

Thanks, good to know.

>   The other places I see mtx_lock used are:
> * kern_ndis.c as a pointless interlock (pointless because the test is
> done outside the lock and the wakeup doesn't use the lock)

Yes, I #ifdef'ed out the mtx_lock() calls here, and have them replaced with=
=20
my new THR_LOCK() macro, which raises the IPL, and aquires a simple lock.

> * kern_windrv.c currently need lockmgr due to malloc call in
> ExAllocatePoolWithTag from IoCreateDevice which is called under the lock,
> apparently to protect the driver_object from being deallocated.  However,
> the time between windrv_lookup and use of the drv is not protected, so
> this looks like bad lock usage.

OK, so there's a call to ltsleep() in malloc() (in sys/kern/kern_malloc.c).=
 =20
So calling malloc() could cause a context switch, and thus any critical=20
sections with a call to malloc() need a sleep lock around them?  I wasn't=20
aware of this, so thanks for pointing it out.

>   What problems did you see without the mtx_lock calls?

Interestingly enough, I just tried commenting them out again, and I'm not=20
seeing any problems (I just downloaded a 170 MB file, .)  I think I may hav=
e=20
tried removing the lockmgr calls in mtx_lock() before removing the spin=20
loop in KefAcquireSpinLockAtDpcLevel(), but thought I'd done it afterwards=
=20
(sorry about this).

>   Your hang with the compat spin locks enabled means that something is
> sleeping or returning without releasing the lock.  The could indicate a
> code problem that affects SP machines also.

Yes, something is wrong here.  I'm assuming that by "compat spin locks" you=
=20
are refering to KefAcquireSpinLockAtDpcLevel(), where I use=20
atomic_cmpset_acq_int() to test and set the "struct kspin_lock".  I copied=
=20
=46reeBSD's atomic_cmpset_acq_int() (from sys/i386/include/atomic.h and put=
 it=20
in nbcompat.h (NetBSD doesn't have this i386 assembly function), so I may=20
have done something wrong here.  The kspin_lock is equivelent to the=20
Windows KSPIN_LOCK, which is just 4 bytes, so they can't use a "struct mtx"=
=20
in place of this in FreeBSD.  I just noticed, however that the NetBSD=20
"struct simplelock" is only 4 bytes as well (as long as LOCKDEBUG isn't=20
defined), so I could use a simple lock here instead (I'll try this out). =20
If this doesn't work then you are probably right, and the lock isn't=20
getting released where it should.

>   Also, in ndis_set_info you can return without releasing NDIS_LOCK.

Thanks for pointing this out.  I added the NDIS_LOCK()/NDIS_UNLOCK() there=
=20
myself, as it was in the FreeBSD-current sources (the sources I'm working=20
off are from FreeBSD 5.4).  Anyway, I fixed it.

Thanks for your help :-)

--nextPart1268741.HFOXuQuMRg
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (NetBSD)

iD8DBQBDJdUkdrBy3JhXjT8RAt7KAJ4vmhsQcbKVs3S1YvQAB08wdNqvvgCfYftI
9ScaJDRcq7ZnpUFTmUVo+mw=
=G87j
-----END PGP SIGNATURE-----

--nextPart1268741.HFOXuQuMRg--