Subject: Spontaneous reboots
To: None <tech-kern@netbsd.org>
From: Brian de Alwis <bsd@cs.ubc.ca>
List: tech-kern
Date: 06/26/2003 16:44:04
Forgive this post; I haven't yet subscribed to tech-kern.  But I
suspect this is the more appropriate list than current-users.

In spending a tiny bit of time trying to figure out the remrunqueue
panics, noticed that the C version in /sys/kern/kern_synch.c doesn't
actually NULL out the lwp's l_forw pointer. 

To check the suspicion that perhaps something was perhaps accidentally
following this pointer, I explicitly NULLed this pointer out too.

    --- kern_synch.c        2003/05/20 13:48:08     1.128
    +++ kern_synch.c        2003/06/26 23:37:30
    @@ -1210,10 +1210,11 @@
		    panic("remrunqueue");
     #endif
	    prev = l->l_back;
    -       l->l_back = NULL;
	    next = l->l_forw;
	    prev->l_forw = next;
	    next->l_back = prev;
    +       l->l_back = NULL;
    +       l->l_forw = NULL;
	    if (prev == next)
		    sched_whichqs &= ~(1 << whichq);
     }

This seems to make sense to me, but I've since noticed several
spontaneous reboots, with no messages to /var/log/messages; which
is of little surprise, since remrunqueue is pretty low-level. 

Note that I never get dumped to DDB, as I set ddb.onpanic=0 since
I run X normally; but given that previous panics always log something
to /v/l/messages, I don't know if that would have happened either.

I hope this little data point might help.

-- 
     Brian de Alwis | Graduate student | Software Practices Lab | UBC
"Passivity & cynicism have always come easily to the educated." - Ed Broadbent