Subject: strange problems with SUN4C-only diskless kernels vs DEBUG and/or DIAGNOSTIC...
To: NetBSD/sparc Discussion List <port-sparc@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-sparc
Date: 02/06/2002 16:32:59
So since my main workstation (the SS1+ with bwtwo) is FUBAR because of
the constantly crashing Xserver, I've been playing with building custom
kernels.  I'd been running GENERIC because the previous half-baked
attempt to build a custom one had failed with a strange panic which
apparently happens just before 'init' starts (manual transcription):

	nfs_boot: my_mask=255.255.255.0
	data fault: pc=0xf0081098 addr=0x64 ser=80<INVAL>
	panic: kernel fault
	Stopped in pid 0 (swapper) at   cpu_Debugger+0x8: jmpl [%i7 + 0x8], %g0
	db> trace
	panic(0xf025d508, 0xf025d400, 0x64, 0xf0276530, 0x40, 0x3ff0000) at panic+0x128
	mem_access_fauilt(0x9, 0x80, 0x64, 0xf081098, 0x4000c1, 0xf02765f8) at mem_access_fault+0x478
	normal_mem_fault(0x44, 0x0, 0x41, 0x4001e2, 0xffffffff, 0x1) at normal_mem_fault+0x28
	soreceive(0xf0296c00, 0x7, 0xf0244ae8, 0x0, 0x6, 0xf029aa64) at soreceive+0x67c
	mi_switch(0xf0296c00, 0x0, 0x1, 0xf0393100, 0x0, 0x0) at mi_switch+0x338
	ltsleep(0xf02a77c8, 0x120, 0xf0251cc8, 0x0, 0x0, 0x0) at ltsleep+0x324
	nfssvc_iod(0xf26ed760, 0xf0002000, 0xf0244ae8, 0x40, 0x6, 0xf029aa64) at nfssvc_iod+0x1d0
	start_nfsio(0xf26ed760, 0x30000000, 0x1a, 0xf02aaa04, 0xf02a4c60, 0x3ff0000) at start_nfsio_0x14
	proc_trampoline(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) at proc_trampoline+0x10
	db> 

Note this was a kernel built without optimisation (hoping I could get a
core dump and attack it with gdb, but there are other problems
preventing that it seems).

So I turned the optimiser back on (it does make a drastic difference in
code size and performance after all!), and most surprisingly now the
kernel just hangs instead of panicing.  The hang is still in mi_switch()
though, with a similar backtrace as above.

Hmmm....  what to do.... maybe there's some earlier corruption?  So then
I turn on 'options DEBUG' and 'options DIAGNOSTIC', and the following
panic happens just as /etc/rc ends, obviously well after 'init' starts,
and thus well past the above panic (manual transcription again):

	Wed Feb  6 14:57:32 EST 2002
	panic: chrtoblktbl too small for cdevsw
	Stopped in pid 79 (syslogd) at  cpu_Debugger+0x4: jmpl [%o7 + 0x8], %g0
	db> trace
	chrtoblk(0x0, 0xf0185400, 0xf0068ee8, 0xf0182800, 0x2, 0xf02a0e80) at chrtoblk+0x1c
	spec_open(0x6, 0x18, 0xf0299200, 0xf0068e5c, 0xf2e1e188, 0x5) at spec_open+0x100
	vn_open(0x0, 0xa, 0xf2e78a38, 0x0, 0x6, 0xf2e1e1a8) at vn_open+0x3a4
	sys_open(0x0, 0xf2ec6f28, 0xf2ec6f20, 0xf005ccf8, 0x33, 0xefffe9c8) at sys_open+0x90
	syscall(0x5, 0xf2ec6fb0, 0x0, 0x1df, 0x2f, 0x0) at syscall+0x1f4
	_syscall(0x6c028, 0x9, 0x0, 0x3, 0xefffec40, 0x10) at _syscall+0x120
	db> examine/d nchrdev
	nchrdev:       123
	db> 

Ah Ha!  I say as I look at the code and read the diff of my local
changes to sys/sparc/sparc/conf.c, changes.  That panic() is from a
patch suggested by der Mouse in PR#14388!  My GENERIC kernel was built
before I applied that patch.  Sure enough there's a missing entry in
chrtoblktbl (the last one, for /dev/cgfourteen -- my sources are from
2001/06/24).

Adding the missing chrtoblktbl entry gives me a running SUN4C-only
kernel, finally!  (all this mostly to get UCONSOLE, though dropping
nearly a megabyte of unneeded driver and CPU support code off the kernel
is also always nice....)

The real mystery though is why does the damn thing work with 'options
DIAGNOSTIC' and/or 'options DEBUG' (both of which I do have in my
GENERIC kernel), and not without?

And what's with the optimiser turning a panic() into a hang!?!?!?


FYI:

text    data    bss     dec     hex     filename
2288446 93088   228512  2610046 27d37e  netbsd-1.5W-GENERIC
2469655 72784   171728  2714167 296a37  netbsd-1.5W-VERY-no-optim
1414123 73008   171920  1659051 1950ab  netbsd-1.5W-VERY--O2
1456711 75264   172048  1704023 1a0057  netbsd-1.5W-VERY-DIAGNOSTIC+DEBUG


-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>