Subject: Re: Unlocking an unlocked mutex
To: Vincent <10.50@free.fr>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 10/13/2005 11:02:26
--Md/poaVZ8hnGTzuv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 13, 2005 at 07:24:46PM +0200, Vincent wrote:
> Bill Studenmund a ?crit :
>=20
> >My day job is writing an application that uses pthreads. These assertion=
s=20
> >REALLY help. The problem with unlocking an unlocked mutex is that you mo=
st=20
> >likely just finished modifying data that another thread thought _it_ had=
=20
> >exclusive access to. You can end up with problems showing up in whatever=
=20
> >code goes to use the unprotected data, so the end error looks like it's=
=20
> >miles away from where the bug is.
>=20
> Could be. As I said, I did not analysed the program quite more. The=20
> error happens as you switch between audio channels, so I presume the=20
> mutex has to do with allocating the audio hardware. Or something close.=
=20
> Anyhow, ignoring the error does not cause any core dump or malfunction.=
=20
> So I assume it is just *dirty* programming.

As far as you can tell so far.

The real problem with these kinds of bugs is that they usually slip
through testing. You won't SEE a problem if your activity load is light
enough that you effectively only have one thread wanting to run at once; =
=20
unless there's another thread that has this mutex locked and is modifying
the protected variables, you won't have a problem. And even "heavy" loads
aren't necessarily enough to notice the problem.

But if you hit the right, heavy load, BOOM!

I recently had a situation where a few (per second) small transactions
were fine, a lot of small transactions were fine, a few large transactions
were fine, but a lot of large transactions hit a problem. Sadly this
problem wasn't a mutex locking one, so I didn't get a clear indication of
the issue. But the idea is the same; seemingly-similar work loads can
trigger different code dynamics.

All of that (and this thread) said, I understand your frustration at the
prospect of trying to debug such an errant program. I had the advantage
that my application was developed under NetBSD, so I was "informed" of
locking errors as soon as I made them. It may not be that easy to find out
where the error really is. Also, the original programmers may not have
included enough info for you to figure out what's supposed to be
happening. I have learned the hard way to note what locks are supposed to
be held and what are supposed to be locked, both on entry and exit, as
part of a routine's comments.

Take care,

Bill

--Md/poaVZ8hnGTzuv
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFDTqEyWz+3JHUci9cRAnRsAJ424Q04tb33wzNQL+1p7eF6PDv2EQCeKpuP
H8GyXFTwRP0FZ1aHf4ptdBE=
=ZQ0Z
-----END PGP SIGNATURE-----

--Md/poaVZ8hnGTzuv--