Subject: Re: linux signal delivery & clone() bug
To: Simon Burge <simonb@wasabisystems.com>
From: Matthew Orgass <darkstar@pgh.net>
List: tech-kern
Date: 11/03/2000 10:31:33
On Fri, 3 Nov 2000, Simon Burge wrote:
> Jarom=EDr Dolecek <dolecek@ibis.cz>  wrote:
> > The issue described in kern/10981 is fixed also with my version of patc=
h
> > (which has some comestic differences to the code in the PR).  I've also
> > tried to make shared only the signal handler function array (I
> > moved sa_mask and sa_flags to separate array), but that didn't help mtv

  sa_mask and sa_flags are per process, not per thread, since they are
part of the signal handler.

> I tried a subset of this patch on my i386 (ie, only the MI and i386
> parts), and I don't see what I would call proper behaviour.  When I try
> the test program from the PR compiled natively against the pthreads
> package, I see each "Thread N has pid X" line 1 second apart, and the
> program exit cleanly.  With a Linux cross-compiled program (using the
> suse_devel package) I see all "Thread ..." messages at the same time
> but the program doesn't exit.  With a patched kernel, I still see all
> the "Thread ..." messages at the same time but the program does exit
> cleanly.
>
> >From reading the pthread_join() man page, the NetBSD native behaviour
> is correct - each new thread should wait for the previous thread to
> finish before it starts executing.

  The Linux behavior is correct, since all thread creates are done first
then all thread joins are done, so all threads should sleep at the same
time.  This is also the behvior when compiled with the PTL library on
NetBSD.=20

> Note that I've done absolutely no thread programming so take my
> interpretation with a small grain of salt.  Either way, since there is a
> difference between the netbsd and emulated Linux behaviour I would have
> to say the patch doesn't fix the bug (or there is a problem with the
> NetBSD native pthreads).=20

  This is a bug in the NetBSD native pthreads, not the Linux emulation.

> > It looks to me like Linux is using quite different mechanism
> > of signal delivery and queing, which LinuxThreads can depend on.
> > I haven't dug too deep into Linux code yet and I don't know
> > all the details of NetBSD signal handling. But it looks to me
> > like only one signal of any given type can be pending
> > on NetBSD, but there can be couple of them on Linux. If you
> > send multiple signals to a process under Linux, the signal
> > gets delivered multiple times, but this might not be case
> > on NetBSD if the process haven't had a chance to run
> > the signal handler since the previous signal of given type
> > was received. If this is true, it might be one of potential
> > problems - unfortunately changing this is not particularily easy :(

  Linux supports POSIX.1b signals (see
http://www.technion.ac.il/guides/osf_doc/APS33DTE/DOCU_006.HTM ).

  These signals carry more information then POSIX.1 signals, which may be
the immediate problem since it looks like NetBSD does not even try to
provide the info.  At least with Blackdown JDK 1.2.2 native threads there
is never more than one signal pending at a time so queueing would not be
the problem.  Filling as much of siginfo as possible might be enough to
get things going, at least for now.

  In the longer term, NetBSD needs to implement POSIX.1b signals.  If no
one else is working on this I would be happy to take a shot at it (at
least the MI, i386, and mips parts).  However, I am not too familiar with
what exactly is going on at that low level so it may take some time for me
to figure things out.  I do have some time to spend on it, but if anyone
wants it done soon they should do it.

Matthew Orgass
darkstar@pgh.net