Subject: Problems with KA650 and -current
To: None <port-vax@netbsd.org>
From: Tom Ivar Helbekkmo <tih@nhh.no>
List: port-vax
Date: 12/10/1998 07:48:01
I'm still unable to get my KA650 to run NetBSD-current (it's stuck at
1.3F, which is rock solid), but now I have a lead on the problem. I'm
in the process of upgrading my home systems (an i386, a Sparc and this
VAX) to -current, and the VAX crashes in exactly the same way it has
every time I've tried since June: it boots the new kernel OK, but when
it gets loaded down a bit, memory-wise (running 'make obj' in /usr/src
preparatory to building things will do it every time, very quickly, as
'make' is a real memory hog), it either gets a page fault in kernel
mode directly, or it does what this dmesg output shows:
panic: vref used where vget required
syncing disks... panic: lockmgr: locking against myself
NetBSD 1.3I (LUDWIG) #0: Wed Dec 9 04:41:42 CET 1998
tih@ludwig.Hamartun.Priv.NO:/usr/local/netbsd/src/sys/arch/vax/compile/LUDWIG
MicroVAX 3500/3600
realmem = 16736256
panic: Segv in kernel mode: pc 86b4a010 addr 86b4a010
syncing disks... panic: ptelen fault in system space: addr ffffffb0 pc 8006dfac
It's rather interesting how the automatic reboot after the crash
always fails the first time, but the next automatic reboot following
the above is fine. (Well, most of the time: it sometimes gets stuck
so badly I have to turn it off and on to get it running again.) In
any case, I've been thinking something related to memory handling,
possibly triggered by a quirk in the particular revision of the KA650
I'm running.
But then, something interesting happened: this time, there was a fresh
snapshot on ftp.netbsd.org, so I fetched the GENERIC kernel from that,
and tried. This crashes too, but in a different way -- could it be
that it's got more debug code in it than what I've got in the ones I
build? Anyway, it gets:
panic: malloc: out of space in kmem_map
...and then drops into the debugger.
Aha! There _is_ memory trouble in the kernel! So, I looked more
closely at the dmesg output, and quickly found something that's quite
interesting. My 1.3F kernel displays:
MicroVAX 3500/3600
realmem = 16736256
avail mem = 12033024
Using 817 buffers containing 836608 bytes of memory.
And the 1.3I (both my locally built one, and the one from the snapshot):
MicroVAX 3500/3600
realmem = 16736256
avail mem = 13955072
Using 25 buffers containing 102400 bytes of memory.
Note that buffer usage! It should be approximately 5% of 16 megs,
right? That's about 800 kilobytes, so 1.3F does it right. 1.3I ends
up with about 1/8 of that size, though, and a corresponding 1/4 of the
relative buffer count, due to the new, 4K page size -- right? So the
buffer count is correct in relation to the total buffer size, but that
size is _not_ what the code in /sys/arch/vax/vax/machdep.c seems to be
intended to produce.
I'm thinking that if this is wrong, it's reasonable to assume that
similar calculation errors might show up elsewhere, and that might
cause the kernel to underallocate internal structures badly, which can
be expected to cause no end of trouble -- right? I've only studied
the code in machdep.c so far, and can't see what's wrong, but maybe
someone else has an idea? Ragge? Does this suggest anything to you?
For reference, here's my kernel config file, and a log of dmesg output
showing an attempt to run a locally compiled kernel, the transition
back to 1.3F, and the transition to the kernel from the snapshot. At
then end of this sequence, the trap into the debugger occurred.
#
# LUDWIG, a KA650 with 16MB RAM
#
# DHV11 760500* 0310*
# TK50 774500 0260
# DEQNA 774440 0120
# KDA50 772150 0154 CMD CQD-220 (disk)
# TU81 760444* 0304* CMD CQD-220 (tape)
# RQDX3 760334* 0300*
include "arch/vax/conf/std.vax"
# Here are all different supported CPU types listed.
#options "VAX8600"
#options "VAX8200"
#options "VAX780"
#options "VAX750"
options "VAX630" # MV II
options "VAX650" # MV III, 3600, 3800, 3900
#options "VAX410" # VS 2000
#options "VAX43" # VS 3100/76
#options "VAX46" # VS 4000/60
# Max users on system; this is just a hint
maxusers 8
# Kernel compiled-in symbolic debugger & system call tracer
#options DDB
#options DDB_HISTORY_SIZE=20
#options DDB_ONPANIC=0
options KTRACE # system call tracing, a la ktrace(1)
#options KMEMSTATS # kernel memory statistics (vmstat -m)
options DIAGNOSTIC # cheap kernel consistency checks
#options DEBUG # expensive debugging checks/support
#options SCSIVERBOSE # Verbose SCSI errors
# Network support
#options GATEWAY
options INET
#options DFLDSIZ="(24*1024*1024)" # default is 16 megabytes
#options DFLSSIZ="(8*1024*1024)" # default is 512 kilobytes
options NTP # Kernel PLL for xntpd
# All supported filesystem types
file-system FFS
#file-system LFS
file-system MFS
file-system NFS
file-system CD9660
#file-system FDESC
file-system KERNFS
file-system NULLFS
file-system PORTAL
file-system PROCFS
#file-system UMAPFS
file-system UNION
#options QUOTA
options FFS_EI # FFS Endian Independant support
options NFSSERVER
#options NFS_BOOT_BOOTPARAM # Use the Sun way for netbooting.
# executable+unreadable and secure+setuid script options
options FDSCRIPTS
options SETUIDSCRIPTS
# System V shared memory & semaphores support.
options SYSVMSG
options SYSVSEM
options SYSVSHM
#options SHMMAXPGS=1024 # 64 pages is the default
#options SHMSEG=32 # 8 segments is the default
# Old compat stuff; needed to run 4.3BSD Reno programs.
# Note that if COMPAT_ULTRIX is set, you lose compatibility with
# 4.3BSD Reno programs and get Ultrix compatibility instead.
# (They cannot coexist).
options COMPAT_VAX1K # Must be present to run pre-1.4 binaries.
options COMPAT_43
#options COMPAT_09
#options COMPAT_10
options COMPAT_11
options COMPAT_12
options COMPAT_13
#options COMPAT_ULTRIX
#options TCP_COMPAT_42
options LKM
config netbsd root on ra0a type ffs
mainbus0 at root
cpu0 at mainbus0 # Only one CPU so far.
uba0 at mainbus0 # MicroVAXen only have QBUS.
uda0 at uba0 csr 0172150 # CMD CQD-220
mscpbus0 at uda0
ra0 at mscpbus0 drive 0
ra1 at mscpbus0 drive 1
ra2 at mscpbus0 drive 2
ra3 at mscpbus0 drive 3
uda1 at uba0 csr 0160334 # RQDX3
mscpbus1 at uda1
ra4 at mscpbus1 drive 0
ra5 at mscpbus1 drive 1
ra6 at mscpbus1 drive 2
ra7 at mscpbus1 drive 3
mtc0 at uba0 csr 0174500 # TQK50
mscpbus2 at mtc0
mt0 at mscpbus2 drive 0
mtc1 at uba0 csr 0160444 # CMD CQD-220
mscpbus3 at mtc1
mt1 at mscpbus3 drive 0
qe0 at uba0 csr 0174440 # DEQNA/DELQA
qd0 at uba? csr 0177400 # QDSS
dhu0 at uba? csr 0160500 # DHU-11
pseudo-device loop 1
pseudo-device pty 32
pseudo-device bpfilter 8 # Not supported by de yet.
pseudo-device sl 1
pseudo-device ppp 1
pseudo-device tun 1
#pseudo-device gre 2 # generic L3 over IP tunnel
pseudo-device tb 1
pseudo-device vnd 4
pseudo-device ccd 4
pseudo-device raid 4 # RAIDframe disk driver
# rnd is EXPERIMENTAL
pseudo-device rnd # /dev/random and in-kernel generator
#
# eof
#
NetBSD 1.3I (LUDWIG) #0: Wed Dec 9 04:41:42 CET 1998
tih@ludwig.Hamartun.Priv.NO:/usr/local/netbsd/src/sys/arch/vax/compile/LUDWIG
MicroVAX 3500/3600
realmem = 16736256
avail mem = 13799424
Using 25 buffers containing 102400 bytes of memory.
mainbus0 (root)
cpu0 at mainbus0: KA650, CVAX microcode rev 4 Firmware rev 18
uba0 at mainbus0: Q22
mtc0 at uba0 csr 174500 vec 774 ipl 17
mscpbus2 at mtc0: version 4 model 3
mscpbus2: DMA burst size set to 4
mt0 at mscpbus2 drive 0: TK50
uda0 at uba0 csr 172150 vec 770 ipl 17
mscpbus0 at uda0: version 6 model 13
mscpbus0: DMA burst size set to 4
ra0 at mscpbus0 drive 0: RA82
uda1 at uba0 csr 160334 vec 764 ipl 17
mscpbus1 at uda1: version 2 model 3
mscpbus1: DMA burst size set to 4
ra4 at mscpbus1 drive 0: 52
ra5 at mscpbus1 drive 1: 52
RX50 at mscpbus1 drive 2 not configured
RX50 at mscpbus1 drive 3 not configured
qe0 at uba0 csr 174440 vec 760 ipl 17
qe0: deqna, hardware address 08:00:2b:02:8e:24
dhu0 at uba0 csr 160500 vec 310 ipl 17
dhu0: rom(1) version 2 rom(0) version 2
Kernelized RAIDframe activated
boot device: ra0
root on ra0a dumps on ra0b
ra0: size 1284720 sectors
panic: vref used where vget required
syncing disks... panic: lockmgr: locking against myself
NetBSD 1.3I (LUDWIG) #0: Wed Dec 9 04:41:42 CET 1998
tih@ludwig.Hamartun.Priv.NO:/usr/local/netbsd/src/sys/arch/vax/compile/LUDWIG
MicroVAX 3500/3600
realmem = 16736256
panic: Segv in kernel mode: pc 86b4a010 addr 86b4a010
syncing disks... panic: ptelen fault in system space: addr ffffffb0 pc 8006dfac
NetBSD 1.3I (LUDWIG) #0: Wed Dec 9 04:41:42 CET 1998
tih@ludwig.Hamartun.Priv.NO:/usr/local/netbsd/src/sys/arch/vax/compile/LUDWIG
MicroVAX 3500/3600
realmem = 16736256
avail mem = 13799424
Using 25 buffers containing 102400 bytes of memory.
mainbus0 (root)
cpu0 at mainbus0: KA650, CVAX microcode rev 4 Firmware rev 18
uba0 at mainbus0: Q22
mtc0 at uba0 csr 174500 vec 774 ipl 17
mscpbus2 at mtc0: version 4 model 3
mscpbus2: DMA burst size set to 4
mt0 at mscpbus2 drive 0: TK50
uda0 at uba0 csr 172150 vec 770 ipl 17
mscpbus0 at uda0: version 6 model 13
mscpbus0: DMA burst size set to 4
ra0 at mscpbus0 drive 0: RA82
uda1 at uba0 csr 160334 vec 764 ipl 17
mscpbus1 at uda1: version 2 model 3
mscpbus1: DMA burst size set to 4
ra4 at mscpbus1 drive 0: 52
ra5 at mscpbus1 drive 1: 52
RX50 at mscpbus1 drive 2 not configured
RX50 at mscpbus1 drive 3 not configured
qe0 at uba0 csr 174440 vec 760 ipl 17
qe0: deqna, hardware address 08:00:2b:02:8e:24
dhu0 at uba0 csr 160500 vec 310 ipl 17
dhu0: rom(1) version 2 rom(0) version 2
Kernelized RAIDframe activated
boot device: ra0
root on ra0a dumps on ra0b
ra0: size 1284720 sectors
syncing disks... 4 4 done
NetBSD 1.3F (LUDWIG) #0: Tue Jun 16 18:10:13 CEST 1998
tih@ludwig.Hamartun.Priv.NO:/sys/arch/vax/compile/LUDWIG
MicroVAX 3500/3600
realmem = 16736256
avail mem = 12033024
Using 817 buffers containing 836608 bytes of memory.
backplane0 (root)
cpu0 at backplane0: KA650, CVAX microcode rev 4 Firmware rev 18
uba0 at backplane0: Q22
mtc0 at uba0 csr 174500 vec 774 ipl 17
mscpbus1 at mtc0: version 4 model 3
mscpbus1: DMA burst size set to 4
mt0 at mscpbus1 drive 0: TK50
uda0 at uba0 csr 172150 vec 770 ipl 17
mscpbus0 at uda0: version 6 model 13
mscpbus0: DMA burst size set to 4
ra0 at mscpbus0 drive 0: RA82
qe0 at uba0 csr 174440 vec 764 ipl 17
qe0: deqna, hardware address 08:00:2b:02:8e:24
boot device: ra0
root on ra0a dumps on ra0b
ra0: size 1284720 sectors
TODR too small - CHECK AND RESET THE DATE.
syncing disks... 10 10 4 done
NetBSD 1.3I (GENERIC) #139: Sun Nov 29 18:39:41 CET 1998
ragge@subzero:/multi/src/sys/arch/vax/compile/GENERIC
MicroVAX 3500/3600
realmem = 16736256
avail mem = 13955072
Using 25 buffers containing 102400 bytes of memory.
mainbus0 (root)
cpu0 at mainbus0: KA650, CVAX microcode rev 4 Firmware rev 18
uba0 at mainbus0: Q22
mtc0 at uba0 csr 174500 vec 774 ipl 17
mscpbus0 at mtc0: version 4 model 3
mscpbus0: DMA burst size set to 4
mt0 at mscpbus0 drive 0: TK50
uda0 at uba0 csr 172150 vec 770 ipl 17
mscpbus1 at uda0: version 6 model 13
mscpbus1: DMA burst size set to 4
ra0 at mscpbus1 drive 0: RA82
uda1 at uba0 csr 160334 vec 764 ipl 17
mscpbus2 at uda1: version 2 model 3
mscpbus2: DMA burst size set to 4
ra1 at mscpbus2 drive 0: 52
ra2 at mscpbus2 drive 1: 52
rx0 at mscpbus2 drive 2: RX50
rx1 at mscpbus2 drive 3: RX50
qe0 at uba0 csr 174440 vec 760 ipl 17
qe0: deqna, hardware address 08:00:2b:02:8e:24
boot device: ra0
root on ra0a dumps on ra0b
ra0: size 1284720 sectors
mountroot: trying nfs...
mountroot: trying ffs...
root file system type: ffs
init: copying out path `/sbin/init' 11
-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"