Subject: port-vax/12520: "panic: kernel stack invalid" with current on VAX
To: None <gnats-bugs@gnats.netbsd.org>
From: None <Thilo.Manske@HEH.Uni-Oldenburg.DE>
List: netbsd-bugs
Date: 04/01/2001 13:49:56
>Number:         12520
>Category:       port-vax
>Synopsis:       I get easily "panic: kernel stack invalid"
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-vax-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 01 04:50:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Thilo Manske
>Release:        current since ~January(?), sorry, I don't remember.
>Organization:
Dies ist Thilos Unix Signature! Viel Spass damit.
>Environment:
	
System: 
	Kernels: GENERIC or custom Kernels
	System 1: VT1300 (~VS 3100m30 without disc controller), 8MB RAM diskless
	System 2: VS 4000/VLC with 16MB RAM, it happens when running diskless
		or when running completly "net-less".
	Compilers and userland (still) from the 1.5 release.
Architecture: vax
Machine: vax

Boot message from System 1 (with GENERIC kernel made from current sources):
VAXstation 3100/m{30,40}
cpu: KA41/42
cpu: Enabling primary cache, secondary cache
total memory = 8076 KB
avail memory = 4816 KB
using 126 buffers containing 504 KB of memory
mainbus0 (root)
vsbus0 at mainbus0
vsbus0: interrupt mask 8
le0 at vsbus0 csr 0x200e0000 vec 120 ipl 14 maskbit 5 buf 0x33d000-0x34cfff
le0: address 08:00:2b:16:d8:1c
le0: 32 receive buffers, 8 transmit buffers
dz0 at vsbus0 csr 0x200a0000 vec 304 ipl 14 maskbit 6
dz0: 4 lines
lkkbd0 at dz0
wskbd0 at lkkbd0
lkms0 at dz0
wsmouse0 at lkms0
smg0 at vsbus0 csr 0x200f0000 vec 104 ipl 14 maskbit 3
wsdisplay0 at smg0
wsdisplay0: screen 0-7 added (128x57, vt100 emulation)
boot device: le0
root on le0

Boot message from System 2 (with GENERIC kernel made from current sources):
MicroVAX 3100/m{30,40}
cpu: KA48             
cpu: turning on floating point chip
total memory = 15996 KB            
avail memory = 12056 KB
using 225 buffers containing 900 KB of memory
mainbus0 (root)                              
vsbus0 at mainbus0
vsbus0: 32K entry DMA SGMAP at PA 0x400000 (VA 0x80400000)
vsbus0: interrupt mask 0                                  
le0 at vsbus0 csr 0x200e0000 vec 770 ipl 15 maskbit 1 buf 0x0-0xffff
le0: address 08:00:2b:32:0b:c6                                      
le0: 32 receive buffers, 8 transmit buffers
dz0 at vsbus0 csr 0x200a0000 vec 124 ipl 15 maskbit 4
dz0: 4 lines                                         
lkkbd0 at dz0
wskbd0 at lkkbd0
lkms0 at dz0    
wsmouse0 at lkms0
asc0 at vsbus0 csr 0x200c0080 vec 774 ipl 15 maskbit 0
asc0: NCR53C94, 25MHz, SCSI ID 6                      
scsibus0 at asc0: 8 targets, 8 luns per target
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 3 lun 0: <DEC, RZ23L    (C) DEC, 2528> SCSI1 0/direct fixed
sd0(asc0:3:0): max sync rate 3.96MB/s                                             
sd0: 116 MB, 1523 cyl, 4 head, 39 sec, 512 bytes/sect x 237588 sectors
boot device: le0                                                      
root on le0     

>Description:
Since about ~three (or maybe even more) months, every current kernel I made with and for
System 1 (VT1300) dies during boot with something like:

[...]
Creating runtime link editor directory cache.
Clearing /tmp.
Starting timed.
panic: kernel stack invalid
Stopped in pid 1 (init) at      _trap+0x138:    tstl    64(r8)
db> trace/t
Process 1
  PCB contents:
        KSP = 0x85710e98
        ESP = 0x8570f064
        SSP = 0x80271200
        USP = 0x7ffffc70
        R[00] = 0x85723000    R[06] = 0x8038f000
        R[01] = 0x0002b918    R[07] = 0x00000000
        R[02] = 0x804021c0    R[08] = 0x8038f000
        R[03] = 0x004fb000    R[09] = 0x80271200
        R[04] = 0x00000000    R[10] = 0x8038f000
        R[05] = 0x00000000    R[11] = 0x00000000
        AP = 0x85710ecc
        FP = 0x85710ea0
        PC = 0x80023674
        PSL = 0xdf0008
        Trap frame pointer: 0x85710fb4
Stack traceback :
0x85710ea0: bpendtsleep+0x0(0x8038f000,0x120,0x8001b57d,0x0,0x0)
0x85710ed4: _sys_wait4+0x2b2(0x8038f000,0x85710f60,0xc)
0x85710f1c: _syscall+0xf5(0x7ffffc70)

Because kernel compilation needs a week now (I don't know why, it used to
be a day with 1.5ALPHA*/BETA), I didn't investigate that further.
(Sorry that I can't give a precise date when exactly this starts to happen,
I accidently deleted my last working current kernel :-( .)

This week I got System 2 (VS4000/VLC) which can stay up a little bit longer,
but later panics as well when I do a "make depend" in a kernel compilation directory
(or something like that):

depending the kern library objects
depending the compat library objects
panic: kernel stack invalid
Stopped in pid 190 (cron) at    _trap+0x138:    tstl    64(r8)
db> trace/t                                                   
Process 190
  PCB contents:
        KSP = 0x86063e80
        ESP = 0x86062064
        SSP = 0x8028fe00
        USP = 0x7ffffcac
        R[00] = 0x86054000    R[06] = 0x80f72008
        R[01] = 0x000302a0    R[07] = 0x00000001
        R[02] = 0x804a7de4    R[08] = 0x80f72008
        R[03] = 0x00a6b000    R[09] = 0x8028fe00
        R[04] = 0x00000000    R[10] = 0x8013558c
        R[05] = 0x00000000    R[11] = 0x00001771
        AP = 0x86063eb4                         
        FP = 0x86063e88
        PC = 0x80023674
        PSL = 0xdf0008 
        Trap frame pointer: 0x86063fb4
Stack traceback :                     
0x86063e88: bpendtsleep+0x0(0x8013558c,0x120,0x8002581b,0x1771,0x86063f28)
0x86063ebc: _sys_nanosleep+0xac(0x80f72008,0x86063f60,0x84)               
0x86063f20: _syscall+0xf5(0x7ffffcac)                      

(That happens usually in cron, syslogd or rwhod.)

To test if it's a network problem, I installed the system on disk and made
sure that there was no single acces to network devices (I even removed the
tranceiver) - it happened again.

I thought it has to do with pagin/swapping since it seems to happen when a
(partly) paged out process is waked up, so I repeated the test without
configuring a swap device - made no difference (Well, it died a little bit
earlier I guess).

BTW: With only 8MB instead of 16MB RAM System 2 dies as early as System 1.

>How-To-Repeat:
Make a current kernel, boot on VT1300 (or VS 3100m30) or VS 4000/VLC or
maybe any VAX without much memory and try to build a kernel.
It's very easy for me to repeat, so if you need more information I could
gather from the debugger, please mail me.

>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted: