Subject: Problems with KA650 and -current
To: None <port-vax@netbsd.org>
From: Tom Ivar Helbekkmo <tih@nhh.no>
List: port-vax
Date: 12/10/1998 07:48:01
I'm still unable to get my KA650 to run NetBSD-current (it's stuck at
1.3F, which is rock solid), but now I have a lead on the problem.  I'm
in the process of upgrading my home systems (an i386, a Sparc and this
VAX) to -current, and the VAX crashes in exactly the same way it has
every time I've tried since June: it boots the new kernel OK, but when
it gets loaded down a bit, memory-wise (running 'make obj' in /usr/src
preparatory to building things will do it every time, very quickly, as
'make' is a real memory hog), it either gets a page fault in kernel
mode directly, or it does what this dmesg output shows:

panic: vref used where vget required
syncing disks... panic: lockmgr: locking against myself
NetBSD 1.3I (LUDWIG) #0: Wed Dec  9 04:41:42 CET 1998
    tih@ludwig.Hamartun.Priv.NO:/usr/local/netbsd/src/sys/arch/vax/compile/LUDWIG

MicroVAX 3500/3600
realmem = 16736256
panic: Segv in kernel mode: pc 86b4a010 addr 86b4a010
syncing disks... panic: ptelen fault in system space: addr ffffffb0 pc 8006dfac

It's rather interesting how the automatic reboot after the crash
always fails the first time, but the next automatic reboot following
the above is fine.  (Well, most of the time: it sometimes gets stuck
so badly I have to turn it off and on to get it running again.)  In
any case, I've been thinking something related to memory handling,
possibly triggered by a quirk in the particular revision of the KA650
I'm running.

But then, something interesting happened: this time, there was a fresh
snapshot on ftp.netbsd.org, so I fetched the GENERIC kernel from that,
and tried.  This crashes too, but in a different way -- could it be
that it's got more debug code in it than what I've got in the ones I
build?  Anyway, it gets:

panic: malloc: out of space in kmem_map

...and then drops into the debugger.

Aha!  There _is_ memory trouble in the kernel!  So, I looked more
closely at the dmesg output, and quickly found something that's quite
interesting.  My 1.3F kernel displays:

MicroVAX 3500/3600
realmem = 16736256
avail mem = 12033024
Using 817 buffers containing 836608 bytes of memory.

And the 1.3I (both my locally built one, and the one from the snapshot):

MicroVAX 3500/3600
realmem = 16736256
avail mem = 13955072
Using 25 buffers containing 102400 bytes of memory.

Note that buffer usage!  It should be approximately 5% of 16 megs,
right?  That's about 800 kilobytes, so 1.3F does it right.  1.3I ends
up with about 1/8 of that size, though, and a corresponding 1/4 of the
relative buffer count, due to the new, 4K page size -- right?  So the
buffer count is correct in relation to the total buffer size, but that
size is _not_ what the code in /sys/arch/vax/vax/machdep.c seems to be
intended to produce.

I'm thinking that if this is wrong, it's reasonable to assume that
similar calculation errors might show up elsewhere, and that might
cause the kernel to underallocate internal structures badly, which can
be expected to cause no end of trouble -- right?  I've only studied
the code in machdep.c so far, and can't see what's wrong, but maybe
someone else has an idea?  Ragge?  Does this suggest anything to you?

For reference, here's my kernel config file, and a log of dmesg output
showing an attempt to run a locally compiled kernel, the transition
back to 1.3F, and the transition to the kernel from the snapshot.  At
then end of this sequence, the trap into the debugger occurred.

#
#	LUDWIG, a KA650 with 16MB RAM
#
#	DHV11  760500*  0310*
#	TK50   774500   0260 
#	DEQNA  774440   0120 
#	KDA50  772150   0154 	CMD CQD-220 (disk)
#	TU81   760444*  0304*	CMD CQD-220 (tape)
#	RQDX3  760334*  0300*

include		"arch/vax/conf/std.vax"

# Here are all different supported CPU types listed.
#options 	"VAX8600"
#options 	"VAX8200"
#options 	"VAX780"
#options 	"VAX750"
options         "VAX630"	# MV II
options         "VAX650"	# MV III, 3600, 3800, 3900
#options 	"VAX410"	# VS 2000
#options 	"VAX43"		# VS 3100/76
#options 	"VAX46"		# VS 4000/60

# Max users on system; this is just a hint
maxusers	8

# Kernel compiled-in symbolic debugger & system call tracer
#options 	DDB
#options 	DDB_HISTORY_SIZE=20
#options 	DDB_ONPANIC=0
options 	KTRACE		# system call tracing, a la ktrace(1)
#options 	KMEMSTATS	# kernel memory statistics (vmstat -m)
options 	DIAGNOSTIC	# cheap kernel consistency checks
#options 	DEBUG		# expensive debugging checks/support
#options 	SCSIVERBOSE	# Verbose SCSI errors

# Network support
#options 	GATEWAY
options 	INET

#options 	DFLDSIZ="(24*1024*1024)"	# default is 16 megabytes
#options 	DFLSSIZ="(8*1024*1024)"		# default is 512 kilobytes

options 	NTP		# Kernel PLL for xntpd

# All supported filesystem types
file-system 	FFS
#file-system 	LFS
file-system 	MFS
file-system 	NFS
file-system 	CD9660
#file-system 	FDESC
file-system 	KERNFS
file-system 	NULLFS
file-system	PORTAL
file-system 	PROCFS
#file-system 	UMAPFS
file-system 	UNION

#options 	QUOTA
options 	FFS_EI		# FFS Endian Independant support
options 	NFSSERVER
#options 	NFS_BOOT_BOOTPARAM	# Use the Sun way for netbooting.

# executable+unreadable and secure+setuid script options
options 	FDSCRIPTS
options 	SETUIDSCRIPTS

# System V shared memory & semaphores support.
options 	SYSVMSG
options 	SYSVSEM
options 	SYSVSHM
#options 	SHMMAXPGS=1024	# 64 pages is the default
#options 	SHMSEG=32	# 8 segments is the default

# Old compat stuff; needed to run 4.3BSD Reno programs.
# Note that if COMPAT_ULTRIX is set, you lose compatibility with
# 4.3BSD Reno programs and get Ultrix compatibility instead.
# (They cannot coexist).
options 	COMPAT_VAX1K	# Must be present to run pre-1.4 binaries.
options 	COMPAT_43
#options 	COMPAT_09
#options 	COMPAT_10
options 	COMPAT_11
options 	COMPAT_12
options 	COMPAT_13
#options 	COMPAT_ULTRIX
#options 	TCP_COMPAT_42

options 	LKM

config		netbsd	root on ra0a type ffs

mainbus0	at root

cpu0		at mainbus0	# Only one CPU so far.
uba0		at mainbus0	# MicroVAXen only have QBUS.

uda0		at uba0	csr 0172150	# CMD CQD-220
mscpbus0	at uda0

ra0		at mscpbus0 drive 0
ra1		at mscpbus0 drive 1
ra2		at mscpbus0 drive 2
ra3		at mscpbus0 drive 3

uda1		at uba0	csr 0160334	# RQDX3
mscpbus1	at uda1

ra4		at mscpbus1 drive 0
ra5		at mscpbus1 drive 1
ra6		at mscpbus1 drive 2
ra7		at mscpbus1 drive 3

mtc0		at uba0 csr 0174500	# TQK50
mscpbus2	at mtc0

mt0		at mscpbus2 drive 0

mtc1		at uba0 csr 0160444	# CMD CQD-220
mscpbus3	at mtc1

mt1		at mscpbus3 drive 0

qe0		at uba0 csr 0174440	# DEQNA/DELQA
qd0		at uba? csr 0177400	# QDSS
dhu0		at uba? csr 0160500	# DHU-11

pseudo-device   loop	1
pseudo-device   pty	32
pseudo-device	bpfilter 8	# Not supported by de yet.
pseudo-device	sl	1
pseudo-device	ppp	1
pseudo-device	tun	1
#pseudo-device	gre		2	# generic L3 over IP tunnel
pseudo-device	tb	1
pseudo-device	vnd	4
pseudo-device	ccd	4
pseudo-device	raid	4		# RAIDframe disk driver
# rnd is EXPERIMENTAL
pseudo-device	rnd		# /dev/random and in-kernel generator

#
#	eof
#

NetBSD 1.3I (LUDWIG) #0: Wed Dec  9 04:41:42 CET 1998
    tih@ludwig.Hamartun.Priv.NO:/usr/local/netbsd/src/sys/arch/vax/compile/LUDWIG

MicroVAX 3500/3600
realmem = 16736256
avail mem = 13799424
Using 25 buffers containing 102400 bytes of memory.
mainbus0 (root)
cpu0 at mainbus0: KA650, CVAX microcode rev 4 Firmware rev 18
uba0 at mainbus0: Q22
mtc0 at uba0 csr 174500 vec 774 ipl 17
mscpbus2 at mtc0: version 4 model 3
mscpbus2: DMA burst size set to 4
mt0 at mscpbus2 drive 0: TK50
uda0 at uba0 csr 172150 vec 770 ipl 17
mscpbus0 at uda0: version 6 model 13
mscpbus0: DMA burst size set to 4
ra0 at mscpbus0 drive 0: RA82
uda1 at uba0 csr 160334 vec 764 ipl 17
mscpbus1 at uda1: version 2 model 3
mscpbus1: DMA burst size set to 4
ra4 at mscpbus1 drive 0:   52
ra5 at mscpbus1 drive 1:   52
RX50 at mscpbus1 drive 2 not configured
RX50 at mscpbus1 drive 3 not configured
qe0 at uba0 csr 174440 vec 760 ipl 17
qe0: deqna, hardware address 08:00:2b:02:8e:24
dhu0 at uba0 csr 160500 vec 310 ipl 17
dhu0: rom(1) version 2 rom(0) version 2
Kernelized RAIDframe activated
boot device: ra0
root on ra0a dumps on ra0b
ra0: size 1284720 sectors
panic: vref used where vget required
syncing disks... panic: lockmgr: locking against myself
NetBSD 1.3I (LUDWIG) #0: Wed Dec  9 04:41:42 CET 1998
    tih@ludwig.Hamartun.Priv.NO:/usr/local/netbsd/src/sys/arch/vax/compile/LUDWIG

MicroVAX 3500/3600
realmem = 16736256
panic: Segv in kernel mode: pc 86b4a010 addr 86b4a010
syncing disks... panic: ptelen fault in system space: addr ffffffb0 pc 8006dfac
NetBSD 1.3I (LUDWIG) #0: Wed Dec  9 04:41:42 CET 1998
    tih@ludwig.Hamartun.Priv.NO:/usr/local/netbsd/src/sys/arch/vax/compile/LUDWIG

MicroVAX 3500/3600
realmem = 16736256
avail mem = 13799424
Using 25 buffers containing 102400 bytes of memory.
mainbus0 (root)
cpu0 at mainbus0: KA650, CVAX microcode rev 4 Firmware rev 18
uba0 at mainbus0: Q22
mtc0 at uba0 csr 174500 vec 774 ipl 17
mscpbus2 at mtc0: version 4 model 3
mscpbus2: DMA burst size set to 4
mt0 at mscpbus2 drive 0: TK50
uda0 at uba0 csr 172150 vec 770 ipl 17
mscpbus0 at uda0: version 6 model 13
mscpbus0: DMA burst size set to 4
ra0 at mscpbus0 drive 0: RA82
uda1 at uba0 csr 160334 vec 764 ipl 17
mscpbus1 at uda1: version 2 model 3
mscpbus1: DMA burst size set to 4
ra4 at mscpbus1 drive 0:   52
ra5 at mscpbus1 drive 1:   52
RX50 at mscpbus1 drive 2 not configured
RX50 at mscpbus1 drive 3 not configured
qe0 at uba0 csr 174440 vec 760 ipl 17
qe0: deqna, hardware address 08:00:2b:02:8e:24
dhu0 at uba0 csr 160500 vec 310 ipl 17
dhu0: rom(1) version 2 rom(0) version 2
Kernelized RAIDframe activated
boot device: ra0
root on ra0a dumps on ra0b
ra0: size 1284720 sectors
syncing disks... 4 4 done
NetBSD 1.3F (LUDWIG) #0: Tue Jun 16 18:10:13 CEST 1998
    tih@ludwig.Hamartun.Priv.NO:/sys/arch/vax/compile/LUDWIG

MicroVAX 3500/3600
realmem = 16736256
avail mem = 12033024
Using 817 buffers containing 836608 bytes of memory.
backplane0 (root)
cpu0 at backplane0: KA650, CVAX microcode rev 4 Firmware rev 18
uba0 at backplane0: Q22
mtc0 at uba0 csr 174500 vec 774 ipl 17
mscpbus1 at mtc0: version 4 model 3
mscpbus1: DMA burst size set to 4
mt0 at mscpbus1 drive 0: TK50
uda0 at uba0 csr 172150 vec 770 ipl 17
mscpbus0 at uda0: version 6 model 13
mscpbus0: DMA burst size set to 4
ra0 at mscpbus0 drive 0: RA82
qe0 at uba0 csr 174440 vec 764 ipl 17
qe0: deqna, hardware address 08:00:2b:02:8e:24
boot device: ra0
root on ra0a dumps on ra0b
ra0: size 1284720 sectors
TODR too small - CHECK AND RESET THE DATE.
syncing disks... 10 10 4 done
NetBSD 1.3I (GENERIC) #139: Sun Nov 29 18:39:41 CET 1998
    ragge@subzero:/multi/src/sys/arch/vax/compile/GENERIC

MicroVAX 3500/3600
realmem = 16736256
avail mem = 13955072
Using 25 buffers containing 102400 bytes of memory.
mainbus0 (root)
cpu0 at mainbus0: KA650, CVAX microcode rev 4 Firmware rev 18
uba0 at mainbus0: Q22
mtc0 at uba0 csr 174500 vec 774 ipl 17
mscpbus0 at mtc0: version 4 model 3
mscpbus0: DMA burst size set to 4
mt0 at mscpbus0 drive 0: TK50
uda0 at uba0 csr 172150 vec 770 ipl 17
mscpbus1 at uda0: version 6 model 13
mscpbus1: DMA burst size set to 4
ra0 at mscpbus1 drive 0: RA82
uda1 at uba0 csr 160334 vec 764 ipl 17
mscpbus2 at uda1: version 2 model 3
mscpbus2: DMA burst size set to 4
ra1 at mscpbus2 drive 0:   52
ra2 at mscpbus2 drive 1:   52
rx0 at mscpbus2 drive 2: RX50
rx1 at mscpbus2 drive 3: RX50
qe0 at uba0 csr 174440 vec 760 ipl 17
qe0: deqna, hardware address 08:00:2b:02:8e:24
boot device: ra0
root on ra0a dumps on ra0b
ra0: size 1284720 sectors
mountroot: trying nfs...
mountroot: trying ffs...
root file system type: ffs
init: copying out path `/sbin/init' 11

-tih
-- 
Popularity is the hallmark of mediocrity.  --Niles Crane, "Frasier"