Subject: Re: linux signal delivery & clone() bug
To: None <tech-kern@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 11/03/2000 22:11:38
Jaromír Dolecek <dolecek@ibis.cz>  wrote:

> Matthias Scheler wrote:
> > On Wed, Nov 01, 2000 at 11:57:23AM +1100, Simon Burge wrote:
> > > I haven't yet tried the patch in that PR to see if it fixes say mtv as
> > > described in kern/10101.
> > 
> > Unfortunately it does not. But there might be a related problem in another
> > system call which causes the "mtv" problem.
> 
> The issue described in kern/10981 is fixed also with my version of patch
> (which has some comestic differences to the code in the PR).  I've also
> tried to make shared only the signal handler function array (I
> moved sa_mask and sa_flags to separate array), but that didn't help mtv
> either. So kern/10981 is probably different bug - fixing it needs
> to change struct proc which is unfortunate, but IMHO need to be
> done (and pulled up to 1.5 branch).

I tried a subset of this patch on my i386 (ie, only the MI and i386
parts), and I don't see what I would call proper behaviour.  When I try
the test program from the PR compiled natively against the pthreads
package, I see each "Thread N has pid X" line 1 second apart, and the
program exit cleanly.  With a Linux cross-compiled program (using the
suse_devel package) I see all "Thread ..." messages at the same time
but the program doesn't exit.  With a patched kernel, I still see all
the "Thread ..." messages at the same time but the program does exit
cleanly.

From reading the pthread_join() man page, the NetBSD native behaviour
is correct - each new thread should wait for the previous thread to
finish before it starts executing.  Note that I've done absolutely no
thread programming so take my interpretation with a small grain of salt.
Either way, since there is a difference between the netbsd and emulated
Linux behaviour I would have to say the patch doesn't fix the bug (or
there is a problem with the NetBSD native pthreads).

> It looks to me like Linux is using quite different mechanism
> of signal delivery and queing, which LinuxThreads can depend on.
> I haven't dug too deep into Linux code yet and I don't know
> all the details of NetBSD signal handling. But it looks to me
> like only one signal of any given type can be pending
> on NetBSD, but there can be couple of them on Linux. If you
> send multiple signals to a process under Linux, the signal
> gets delivered multiple times, but this might not be case
> on NetBSD if the process haven't had a chance to run
> the signal handler since the previous signal of given type
> was received. If this is true, it might be one of potential
> problems - unfortunately changing this is not particularily easy :(

This suggests that the patch in PR kern/4821 will not fix the mtv
problem either then - do you agree with that?

Simon.
--
Simon Burge                            <simonb@wasabisystems.com>
NetBSD Sales, Support and Service:  http://www.wasabisystems.com/