Port-alpha archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: can't reboot after running a 5.0 kernel



On Sun, Jan 30, 2011 at 12:23:25PM +0100, Martin Husemann wrote:
> I'll have to dig through old PRs and (unfortunately not very clear) commit
> messages.
> 
> Can you please file a PR? We should make up our mind if cpu_setfunc is
> supposed to call lwp_startup() or not, make sure all ports do it consistently,
> and find out if the SA compat code needs fixing (e.g. arrange for a call to
> lwp_startup by other means). Or backout all the changes to various ports
> that introduced the separate trampoline.

Hi-

    I filed a PR entitled:

4.0 sa threaded apps hard hang netbsd-5 and HEAD kernels on some ports 
[cpu_setfunc() related]

        http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=44500

For this.


I got some additional information in the process...

First, the hard hang occurs in mi_switch() in the following loop (I 
added the debug printf):

                /*
                 * We may need to spin-wait for if 'newl' is still
                 * context switching on another CPU.
                 */

               if (newl->l_ctxswtch != 0) {
                        u_int count;
                        count = SPINLOCK_BACKOFF_MIN;
                        while (newl->l_ctxswtch) {
                                SPINLOCK_BACKOFF(count);
printf("POINTA\n");  /*XXXCDC*/
                        }
                }

it just prints "POINTA" endlessly --- it never exits that loop.  Note 
my system only has one CPU (so the case the comment is looking for does 
not apply).  Because interrupts are disabled, it is not possible to break 
to DDB if you are stuck in that while() loop, your system is hung (that's 
why you have to power cycle).


I also did a survey of some of the ports in the tree, and it looks
some port's cpu_setfunc() still call lwp_startup() while other ports
have been modified (like the alpha) to not call it:

arch    cpu_setfunc calls       does it call lpw_startup?  when changed?
------- ----------------------  ----------------------------------------
acorn26 lwp_trampoline          yes 
alpha   setfunc_trampoline      no (vm_machdep.1.100, 2009/06/01)
arm32   lwp_trampoline          yes
hppa    setfunc_trampoline      no (vm_machdep.c 1.36, 2009/06/03)
m68k    setfunc_trampoline      no (vm_machdep.c 1.28, 2009/05/30)
mips    setfunc_trampoline      no (vm_machdep.c 1.123, 2009/05/30)
powerpc setfunc_trampoline      no (vm_machdep.c 1.77, 2009/06/07)
sh3     lwp_setfunc_trampoline  no (never called lpw_startup?)
sparc   lwp_setfunc_trampoline  no (vm_machdep.c 1.100, 2009/05/29)
sparc64 lwp_setfunc_trampoline  no (vm_machep.c 1.89, 2009/05/30)
x86     lwp_trampoline          yes

the "no" ports are likely to have problems with compat_sa binaries,
I think.


The most interesting one is the sh3 (because it didn't get the change
in 2009) and the commit comment from mrg on the sparc (because it
is the earliest instance of this change --- 2009/05/29):

----------------------------
revision 1.100
date: 2009/05/29 22:06:56;  author: mrg;  state: Exp;  lines: +11 -5
fix up cpu_setfunc() as noted by uwe:

- don't call lwp_startup for cpu_setfunc() users
- introduce lwp_setfunc_trampoline instead
- no need to set the "new" lwp for setfunc
----------------------------


But I couldn't find where mrg said that uwe@netbsd noted it.


chuck


Home | Main Index | Thread Index | Old Index