Subject: Re: help with nathanw_sa branch on sh3
To: None <port-sh3@netbsd.org>
From: Valeriy E. Ushakov <uwe@ptc.spbu.ru>
List: port-sh3
Date: 11/14/2003 06:36:24
On Wed, Dec 18, 2002 at 14:13:28 -0800, Jason R Thorpe wrote:

> I've been trying to track down lossage with the nathanw_sa branch
> on the sh3 (well, my test system is really an sh4 [dreamcast], but
> you get the idea :-)
> 
> The problem is most easily produced with the "sa3" test (which I
> have attached at the end of this message).  Running the test will
> cause an assertion failure in tlb_exception() if you are running
> a DEBUG kernel:
> 
> 	if (usermode) {
> 		KDASSERT(l->l_md.md_regs == tf);
> 	}
> 
> ...here is what I know about the problem so far:
> 
> 	* Process has 2 LWPs at the time of assertion failure.
> 
> 	* LWP1 l_md.md_regs == tf1-address
> 
> 	* LWP2 l_md.md_regs == tf2-address
> 
> 	* When the assertion fails, LWP1 is the active LWP, but the
> 	  tf passed to tlb_exception() is tf2-address.
> 
> ...how this can happen, I just don't know.  I guess somehow r6_bank
> (though the saved value in the PCB looks fine ??) for LWP1 is getting
> the wrong value stuffed into it at some point, but I'm not really sure
> how that is happening.

-current DEBUG kernel trips on assertion in cpu_lwp_fork

	KDASSERT(!(l1 != curlwp && l1 != &lwp0));

At this point the curlwp is the lwp #1 of sa3 which is in nanosleep
(calling sleep in the upcall), it's in LSSLEEP and its l_flags is
0x500084.  But l1 is the lwp #2 of sa3, LSRUN, 0xa00004.

Also, at the first glance l2 doesn't look like an address from the
pool of lwps.

I haven't done much research beyond doing some ps/l and staring at a
couple of hex dumps.  I'm interested in debugging this problem, but I
could use some help from someone familiar with the innards of sa/lwp.

SY, Uwe
-- 
uwe@ptc.spbu.ru                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen