Subject: Re: kern/24829
To: Chuck Silvers <chuq@chuq.com>
From: Christos Zoulas <christos@zoulas.com>
List: netbsd-bugs
Date: 11/09/2005 12:28:19
On Nov 9,  8:55am, chuq@chuq.com (Chuck Silvers) wrote:
-- Subject: Re: kern/24829

| On Tue, Nov 08, 2005 at 10:00:47AM +0100, Jarle Greipsland wrote:
| > Jarle Greipsland <jarle@uninett.no> writes:
| > > It is still there.  On a 3.0_BETA kernel from around October
| > > 20th, I still got the same panic.  It is not the same system that
| > > for which the original problem report was filed, but the current
| > > system is also a quad-cpu i386-family system.  Console log below.
| > > Please let me know if there is any other information you want me
| > > to try and gather.
| > Some more data.  I briefly looked in the apache web server logs,
| > and noticed some pthread-releated messages.  I don't know whether
| > they are related to the panic or not.  Log messages below.
| > 
| > 					-jarle
| > 
| > [Tue Nov 08 08:27:39 2005] [warn] Init: Session Cache is not configured [hint: SSLSessionCache]
| > [Tue Nov 08 08:27:42 2005] [notice] Digest: generating secret for digest authentication ...
| > [Tue Nov 08 08:27:42 2005] [notice] Digest: done
| > [Tue Nov 08 08:27:42 2005] [notice] Apache/2.0.55 (Unix) mod_ssl/2.0.55 OpenSSL/0.9.7d DAV/2 configured -- resuming normal operations
| > [Tue Nov 08 08:27:47 2005] [notice] child pid 19659 exit signal Segmentation fault (11)
| > assertion "unreachable" failed: file "/usr/src/lib/libpthread/pthread.c", line 622, function "pthread_exit"
| > [Tue Nov 08 08:27:59 2005] [notice] child pid 5309 exit signal Abort trap (6)
| 
| these "unreachable" assertions should be fixed with my recent changes to
| libpthread.  those have been applied to -current and the 3.x branch so far,
| are you running with the latest libpthread?
| 
| 
| > assertion "target->pt_state != PT_STATE_RUNNING || target->pt_blockgen != target->pt_unblockgen" failed: file "/usr/src/lib/libpthread/pthread_sig.c", line 812, function "pthread__kill"
| 
| this is because the libpthread code does not yet support running with
| PTHREAD_CONCURRENCY > 1, as evidenced by the comment right before the
| assertion:
| 
| 	/*
| 	 * Ensure the victim is not running.
| 	 * In a MP world, it could be on another processor somewhere.
| 	 *
| 	 * XXX As long as this is uniprocessor, encountering a running
| 	 * target process is a bug.
| 	 */
| 	pthread__assert(target->pt_state != PT_STATE_RUNNING ||
| 		target->pt_blockgen != target->pt_unblockgen);
| 
| 
| I was hoping that at least the kernel would survive this configuration,
| but apparently not.  I'll see if I can figure out how to avoid the crash,
| but if we can't fix it very quickly then we should consider disabling
| PTHREAD_CONCURRENCY > 1 for the 3.0 release.

I've done a lot of work with this, and I have this particular test case
almost working. But there is a large number of places in the kernel with
XXX multiprocessor LWPs? Implement me!  or equivalent. But definitely
there is no hope of getting PTHREAD_CONCURRENCY working properly for 3.0.
We should document this clearly. I really want to get PTHREAD_CONCURRENCY
work for 4.0... But it is not an easy goal.

cristos