Port-alpha archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: can't reboot after running a 5.0 kernel



On Fri, Jan 28, 2011 at 07:27:14PM -0500, Chuck Cranor wrote:
> On Sat, Jan 29, 2011 at 12:43:12AM +0100, Martin Husemann wrote:
> > I'm not sure I see what you mean - the code in -current looks pretty much
> > identical to me.


hi-

    I looked a bit more at the changes you submitted for pullup
in netbsd-5 ticket 798:

        http://releng.netbsd.org/cgi-bin/req-5.cgi?show=798

I don't know enough about that code to understand why you wanted
that change, but I was able to factor it out a bit anyway....


    The changes to vm_machdep.c appear to have removed the
call to cpu_setfunc() from cpu_lwp_fork() and replaced it
with the actual content of the old cpu_setfunc() function.
The net result here is that the behavior of cpu_lwp_fork() 
does not change, but it no longer calls cpu_setfunc().

    The old cpu_setfunc() is now replace with a new stripped 
down version that calls setfunc_trampoline() instead of 
lwp_trampoline()  [the s3 register is no longer setup or used]
The only thing that calls the cpu_setfunc() is now compat_sa.c
( cpu_lwp_fork() no longer calls it ).

    The main difference between the lwp_trampoline() and the new
setfunc_trampoline() is that the setfunc_trampoline() no longer
calls lwp_startup().   Removin the call to lwp_startup() causes
the alpha to hang hard if you run a 4.0 threaded app like "dig"...

    So, lwp_startup() does something that keeps the system from
hanging.   To figure out what that was, I started adding in bits
of lpw_startup() into the setfunc_trampoline() until the system
stopped hanging.   It turns out the two critical bits are:

void
xlwp_startup(struct lwp *prev, struct lwp *new)
{
        if (prev != NULL) {
                curcpu()->ci_mtx_count++;  /*YES*/
                prev->l_ctxswtch = 0;      /*YES*/
        }
}

    Put that much of lwp_startup() back into setfunc_trampoline(), and 
the system no longer hangs when you run "dig"... a complete diff
that applies to a netbsd-5 branch checked out on date 10-Jun-2009
(e.g. with "cvs -q update -r netbsd-5 -dP -D 10-Jun-2009") is included
at the end.

    You need both the l_ctxswtch and ci_mtx_count statements.
If you comment out the "l_ctxswtch" statement, the system hangs
as soon as you run "dig".    If you comment out the ci_mtx_count
statement, the system runs "dig" (it prints an error message to
console) but then hangs when "dig" exits.   Couldn't get DDB in
either case.

    What parts of lwp_startup() are you trying to avoid?

chuck


Index: arch/alpha/alpha/locore.s
===================================================================
RCS file: /cvsroot/src/sys/arch/alpha/alpha/locore.s,v
retrieving revision 1.113.10.1
diff -u -r1.113.10.1 locore.s
--- arch/alpha/alpha/locore.s   9 Jun 2009 17:38:38 -0000       1.113.10.1
+++ arch/alpha/alpha/locore.s   30 Jan 2011 03:47:33 -0000
@@ -752,6 +752,9 @@
  * Simplified version of above: don't call lwp_startup()
  */
 LEAF_NOPROFILE(setfunc_trampoline, 0)
+       mov     v0, a0   /* NEW */
+       mov     s3, a1   /* NEW */
+       CALL(xlwp_startup)   /* NEW */
        mov     s0, pv
        mov     s1, ra
        mov     s2, a0
Index: arch/alpha/alpha/vm_machdep.c
===================================================================
RCS file: /cvsroot/src/sys/arch/alpha/alpha/vm_machdep.c,v
retrieving revision 1.96.30.1
diff -u -r1.96.30.1 vm_machdep.c
--- arch/alpha/alpha/vm_machdep.c       9 Jun 2009 17:38:39 -0000       
1.96.30.1
+++ arch/alpha/alpha/vm_machdep.c       30 Jan 2011 03:47:33 -0000
@@ -228,6 +228,8 @@
            (u_int64_t)exception_return;        /* s1: ra */
        up->u_pcb.pcb_context[2] =
            (u_int64_t)arg;                     /* s2: arg */
+       up->u_pcb.pcb_context[3] =
+           (u_int64_t)l;                       /* s3: lwp */
        up->u_pcb.pcb_context[7] =
            (u_int64_t)setfunc_trampoline;      /* ra: assembly magic */
 }      
Index: kern/kern_lwp.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_lwp.c,v
retrieving revision 1.126.2.2
diff -u -r1.126.2.2 kern_lwp.c
--- kern/kern_lwp.c     8 Mar 2009 03:15:36 -0000       1.126.2.2
+++ kern/kern_lwp.c     30 Jan 2011 03:48:08 -0000
@@ -706,6 +706,22 @@
        }
 }
 
+
+/*
+ * Called by MD code when a new LWP begins execution.  Must be called
+ * with the previous LWP locked (so at splsched), or if there is no
+ * previous LWP, at splsched.
+ */
+void xlwp_startup(struct lwp *prev, struct lwp *new);
+void
+xlwp_startup(struct lwp *prev, struct lwp *new)
+{
+       if (prev != NULL) {
+               curcpu()->ci_mtx_count++;  /*YES*/
+               prev->l_ctxswtch = 0;      /*YES*/
+       }
+}
+
 /*
  * Exit an LWP.
  */



Home | Main Index | Thread Index | Old Index