Subject: Re: Threading problems
To: Nathan J. Williams <>
From: Eric Haszlakiewicz <>
List: tech-kern
Date: 11/23/2004 15:54:43
On Tue, Nov 23, 2004 at 04:17:02PM -0500, Nathan J. Williams wrote:
> Eric Haszlakiewicz <> writes:
> > 	So why don't we initialize them then?  I can see how we would want to
> > avoid taking the performance hit on locking for programs that don't need
> > it, but wouldn't the impact of initialization be much smaller? 
> Initialization is something of a red herring; static initialization of
> mutexes and condvars does happen without libpthread linked in.
> The bigger problem is that you don't know what state things are
> supposed to be in. Consider a sequence (which I've seen) that is
> effectively like:
>   mutex_lock()
>   dlopen("")
>   mutex_unlock()
> For this to work at all, the dlopen() has to cause the mutex routines
> to bind to the real thing, which itself will take some
> trickiness we don't have yet (or another layer of indirection in libc,
> which slows down both threaded and nonthreaded programs). But once
> that's working, the mutex_unlock() will be unlocking something that
> was operated on by a dummy mutex_lock(). There's no way that the code
> can magically know which bits of memory are suddenly supposed to
> become locked mutexes.

	so mutex_lock() would actually have to lock the mutex, just not in
a thread safe manner.  Which if it doesn't have to be thread safe is
just setting a variable, right?  

So, current no-pthread mutex_lock():
	checks value of global int (__isthreaded)

non-thread safe way:
possible code {
	if (mutex->ptm_lock != __SIMPLELOCK_UNLOCKED)
		do error
	mutex->ptm_lock = __SIMPLELOCK_LOCKED;
	return 1;
	deferences pointer (the pthread_mutex_t * passed in ) (twice)
	check value of int (mutex->ptm_lock) (*)
	set value of int (mutex->ptm_lock) (*)

thread safe way: (no contention)
	deferences pointer (the pthread_mutex_t * passed in ) (once)
	call function (pthread__simple_lock_try()) (mutex->ptm_lock) (*)
	set lock 
	deferences pointer (pointer to mutex->ptm_lock) (twice)
	check value of int (*)
	set value of int (*)
	unset lock
	set pthread_t (ptm_owner)

The thread safe way will clearly be a lot slower than the current no-pthread
way.  For this particular example, I think the non-thread safe way would
allow for the thread safe way to be switched in at any time.  The non-thread
safe way would be a little slower than the current way.  I'm assuming
the other methods can be partially stubbed out in a similar fashion.
So, the question is: for programs that use dlopen(), is the (smaller)
performance hit a worthwhile tradeoff for allowing threaded libraries to
be seamlessly loaded?  I think it probably is.
	For someone that is really concerned about performance, ld.elf_so
could look at (e.g.) a LD_NOPTHREAD env variable and use the current
empty stub even if dlopen() is used.
	Of course the quick fix is to have ld.elf_so always load libpthread
if dlopen() is used.