Subject: Problem with -current from 4.99.20...
To: VAX porting list <port-vax@netbsd.org>
From: Johnny Billquist <bqt@softjar.se>
List: port-vax
Date: 07/24/2007 14:36:51
I'm trying to figure out how NetBSD broke when 4.99.20 were introduced,
but haven't really come that far, nor have I had much time.
But I thought I'd pop the questions here, and maybe someone else with
more knowledge can look at it, and perhaps give some feedback and ideas.
What happens is that the system runs fine for quite a while. My test
scenario involves running build.sh, which will cause the system to crash
after several hours. So it's not quick to reproduce, nor easy to isolate
from that point of view.
The hardware is a VAX 4000/90 with 128 megs of memory.
From observations, it seems that the problems come when I run out of
memory, and start allocating swap space.
I have the machine crashed in ddb right now, and here are some relevant
info:
-----------------------------------------------------------------
login: panic: Segv in kernel mode: pc 801b00f6 addr 4
Stopped in pid 0.4 (system) at netbsd:trap+0x4fc: movl $1, -64(fp)
db> bt
panic: Segv in kernel mode: pc %x addr %x
Stack traceback :
0x8c01bd34: trap+0x4fc(0x8c01bdfc)
0x8c01bdfc: trap type=0x8 code=0x4 pc=0x801b00f6 psl=0x4
0x8c01bdc8: pmap_deactivate+0x1a(0x84fab100)
0x8c01be4c: cpu_swapout+0x1f(0x84fab100)
0x8c01be6c: uvm_swapout+0x50(0x84fab100)
0x8c01be90: uvm_swapout_threads+0x114(void)
0x8c01bec8: uvm_pageout+0x216(0x87f41820)
0x8c01bf64: cpu_lwp_bootstrap+0x15(0)
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND
WAIT
19923 363 442 0 2 0x4000 1 cc1
363 6755 442 0 2 0x4000 1 cc
wait
6755 3067 442 0 2 0x4000 1 nbgmake
wait
3067 21197 442 0 2 0x4000 1 sh
wait
23358 388 388 12 2 0x4100 1 pickup
select
21197 19374 442 0 2 0x4000 1 nbgmake
wait
19374 12938 442 0 2 0x4000 1 sh
wait
12938 3459 442 0 2 0x4000 1 nbmake
wait
3459 8751 442 0 2 0x4000 1 sh
wait
8751 10418 442 0 2 0x4000 1 nbmake
wait
10418 24288 442 0 2 0x4000 1 sh
wait
24288 17653 442 0 2 0x4000 1 nbmake
wait
17653 26565 442 0 2 0x4000 1 sh
wait
26565 24824 442 0 2 0x4000 1 nbmake
wait
24824 1895 442 0 2 0x4000 1 sh
wait
1895 442 442 0 2 0x4000 1 nbmake
wait
66 426 66 0 2 0x4000 1 tail
kqread
442 426 442 0 2 0x4000 1 sh
wait
426 404 426 0 2 0x4000 1 tcsh
pause
404 400 404 2026 2 0x4000 1 tcsh
pause
400 412 412 2026 2 0x100 1 sshd
select
412 268 412 0 2 0x4101 1 sshd
netio
413 1 413 0 2 0x4000 1 getty
ttyin
409 388 388 12 2 0x4100 1 qmgr
select
402 1 402 0 2 0 1 cron
nanoslp
408 1 408 0 2 0 1 inetd
kqread
388 1 388 0 2 0x4100 1 master
select
268 1 268 0 2 0 1 sshd
select
277 1 277 0 2 0 1 rwhod
select
278 1 278 0 2 0 1 ntpd
pause
169 1 169 0 2 0 1 ypbind
select
168 1 168 0 2 0 1 rpcbind
select
102 1 102 0 2 0 1 syslogd
kqread
1 0 1 0 2 0x4001 1 init
wait
>0 -1 0 0 2 0x20002 13 system
*
db>
--------------------------------------------------
I find it a bit interesting that it looks like it's more or less
directly after a fork that this happens, that it somehow gets into
uvm_pageout (I haven't figured out how it gets there), and that process
0 is active.
Is some kind of kernel thread created here? Or maybe activated for the
first time?
Anyone who knows more about these innards?
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol