Subject: Re: The dreaded thread bug [was Re: Stable again?]
To: Geoff Adams <gsa-netbsd@alldestroying.com>
From: Martin Husemann <martin@duskware.de>
List: port-sparc64
Date: 10/28/2006 12:39:39
On Fri, Oct 27, 2006 at 08:02:25PM -0400, Geoff Adams wrote:
> And, of course, the big question looming in my mind:
> 
> - If this is related to trap handling, why does this happen only when  
> executing threaded processes?

For one, the underlying bug could only be triggered during an upcall, and
may be avoided on other archs because they touch their stack a lot earlier.

Another pointer is PR 33075, which seems to indicate it is not only
SA-threaded applications. Last I tried (it's been a while) I could reproduce
that PR on a U2 running a GENERIC, but not on the same machine running a more
stripped kernel.

I would suggest to attack that PR first, as it is probably simpler to analyze
(not requiring any SA knowledge).

Martin
P.S.: I've been recently running a firefox on sparc64 for > 8 hours without
a single crash. This used to be a pretty reliable way to reproduce the
"all %l registers are zero after return from syscall" problem.