Subject: port-sparc/10419: under recent sun4m kernels Xsun hangs system the second time it runs
To: None <gnats-bugs@gnats.netbsd.org>
From: None <jbernard@mines.edu>
List: netbsd-bugs
Date: 06/22/2000 11:48:11
>Number:         10419
>Category:       port-sparc
>Synopsis:       under recent sun4m kernels Xsun hangs system the second time it runs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sparc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jun 22 11:49:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Jim Bernard
>Release:        June 14, 2000
>Organization:
	Speaking for myself
>Environment:
System: NetBSD knox 1.4ZD NetBSD 1.4ZD (KNOX-$Revision: 1.10 $) #0: Wed Jun 14 12:33:17 MDT 2000 jbernard@knox:/fh/usr/tmp/compile/sys/arch/sparc/compile/KNOX sparc


>Description:
	Shortly after the recent round of coredumping problems on sun4m
	viking systems was solved, I was able to run X again (along with
	all other programs) but began having the problem that Xsun would
	only run once.  The second time I log in and start X, the server
	hangs and I'm unable to log in either on the console or via a
	network connection.  Logging in before starting Xsun the second
	time and not starting Xsun works fine.  I tried updating userland
	(June 14 sources), and that didn't fix it.  I also tried updating
	X (the whole thing, not just the server) (June 21 sources), and that
	made no difference.  I tried eliminating all of my personal
	customizations (moving .xinitrc, .Xresources, and the like out of the
	way, so that only a server and one xterm started up, with no window
	manager), and that made no difference.  So, this clearly has something
	to do with recent kernel changes.

	I am able to break into ddb, and an abbreviated copy of the traceback
	(omitting args) looks like:

	   kbd_zd_rxint +0x70
	   zsc_intr_hard +0x4c
	   zshard +0x44
	   sparc_interrupt +0x120
	   udv_attach +0xa0
	   uvm_mmap +0x12c
	   sys_mmap +0x390
	   syscall +0x1f8
	   _syscall +0xb8

	If I force a dump, gdb reports:

	   #0  0xf003e478 in mi_switch ()
	   #1  0xf003db84 in bpendtsleep ()
	   #2  0xf011404c in uvm_scheduler ()
	   #3  0xf002ebd0 in check_console ()
	   #4  0xf0007318 in cpu_hatch ()
	   can not access 0x707c34, invalid address (707c34)
	   can not access 0x707c34, invalid address (707c34)
	   can not access 0x707c34, invalid address (707c34)
	   can not access 0x707c34, invalid address (707c34)
	   can not access 0xefffffd8, invalid address (efffffd8)
	   can not access 0xefffffd8, invalid address (efffffd8)
	   Cannot access memory at address 0xefffffd8.

	And ps on the dump shows that Xsun was very busy:

	   USER       PID %CPU %MEM   VSZ RSS TT STAT STARTED    TIME COMMAND
	   ...       1365 98.1  0.0  2092   0 ?? R    10:49AM 0:36.00 (Xsun)

	BTW: I've gotten exactly the same results from ddb, gdb, and ps in
	multiple rounds of this -- it's very repeatable, and always the
	second time Xsun is run.

	One other oddity: when X shuts down, the screen turns bright green
	or yellow, and remains that way, except for the portion of the
	screen that has new console text written to it.  This has been
	happening since I began testing new kernels back in May that had
	the coredumping problem.  It doesn't happen after a minimal X
	session (server and one xterm, no window manager).

	The machine is:

total memory = 31980 KB
avail memory = 27008 KB
using 425 buffers containing 1700 KB of memory
bootpath: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@0,0
mainbus0 (root): SUNW,SPARCstation-20
cpu0 at mainbus0: mid 8: TMS390Z50 v0 or TMS390Z55 @ 50 MHz, on-chip FPU
cpu0: physical 20K instruction (64 b/l), 16K data (32 b/l): cache enabled
cpu1 at mainbus0: mid 10: TMS390Z50 v0 or TMS390Z55 @ 50 MHz, on-chip FPU
cpu1: physical 20K instruction (64 b/l), 16K data (32 b/l): cache enabled
obio0 at mainbus0
clock0 at obio0 slot 0 offset 0x200000: mk48t08 (eeprom)
timer0 at obio0 slot 0 offset 0x300000 delay constant 23
zs0 at obio0 slot 0 offset 0x100000 level 12 softpri 6
zstty0 at zs0 channel 0
zstty1 at zs0 channel 1
zs1 at obio0 slot 0 offset 0x0 level 12 softpri 6
kbd0 at zs1 channel 0 (console input)
ms0 at zs1 channel 1
fdc0 at obio0 slot 0 offset 0x700000 level 11 softpri 4: chip 82077
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
auxreg0 at obio0 slot 0 offset 0x800000
power0 at obio0 slot 0 offset 0xa01000 level 2
iommu0 at mainbus0 ioaddr 0xe0000000: version 0x1/0x1, page-size 4096, range 64MB
sbus0 at iommu0: clock = 25 MHz
dma0 at sbus0 slot 15 offset 0x400000: rev 2
esp0 at dma0 slot 15 offset 0x800000 level 4: ESP200, 40MHz, SCSI ID 7
scsibus0 at esp0: 8 targets, 8 luns per target
ledma0 at sbus0 slot 15 offset 0x400010: rev 2
le0 at ledma0 slot 15 offset 0xc00000 level 6: address 08:00:20:21:70:e4
le0: 8 receive buffers, 2 transmit buffers
bpp0 at sbus0 slot 15 offset 0x4800000 level 2 (ipl 3): rev 2
SUNW,DBRIe at sbus0 slot 14 offset 0x10000 level 9 not configured
AltaTechnology,HSIDRV at sbus0 slot 0 offset 0x800000 level 9 not configured
AltaTechnology,HSIDRV at sbus0 slot 1 offset 0x800000 level 9 not configured
cgsix0 at sbus0 slot 2 offset 0x0 level 9: SUNW,501-2325, 1152 x 900, rev 11 (console)
cgsix0: attached to /dev/fb
eccmemctl0 at mainbus0: version 0x0/0x2

>How-To-Repeat:
	Run Xsun twice (not simultaneously, of course) on a recent sun4m
	kernel.

>Fix:
	Unknown.
>Release-Note:
>Audit-Trail:
>Unformatted: