Subject: kern/21900: Compaq Smart Array Kernel Panic
To: None <gnats-bugs@gnats.netbsd.org>
From: None <root@amalgam.dyndns.org>
List: netbsd-bugs
Date: 06/16/2003 16:18:37
>Number:         21900
>Category:       kern
>Synopsis:       Compaq Smart Array Kernel Panic
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 16 07:19:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Charlie Root
>Release:        NetBSD 1.6Q (Built 04-03-2003)
>Organization:
	
>Environment:
	
	
System: NetBSD orion 1.6Q NetBSD 1.6Q (GENERIC) #0: Wed Apr 2 17:44:22 JST 2003 root@orion:/usr/local/netbsd-current/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:

I have been receiving "biodone already" and "disk_unbusy" kernel panics 
pretty regularly on several boxes that I have attempted to install NetBSD 
1.6.1 & 1.6.1T on to.  The kernel panics come during heavy disk read/write
access times.  Example crashes follow.  

This report was generated on another machine as the boxes with the problem
are too unstable to be used to create a report.

ld0f: error writing fsbn 90048 of 90048-90063 (ld0 bn 5597760; cn 1388 tn 21 sn 21)
ld0f: error writing fsbn 90048 of 90048-90063 (ld0 bn 5597760; cn 1388 tn 21 sn 21)
ld0: dk_busy < 0
panic: disk_unbusy
stopped in pid 227 (tar) at    cpu_debugger+0x4:      leave
stopped in pid 227 (tar) at    cpu_debugger+0x5:      ret
stopped in pid 227 (tar) at    panic+0xad:    jmp   panic+0x118
stopped in pid 227 (tar) at    panic+0x118:   addl   $-0x8,%esp
stopped in pid 227 (tar) at    panic+0x11b:   pushl  $0

Hardware
Server 1:
Compaq Proliant 1850R (PIII 600)  128MB RAM
Compaq Smart Array 3200
4 x 9.1 GB Ultra2 SCSI HD (Tried RAID 0+1 and also RAID 5)

Server 2:
Compaq Proliant 1600 (PII 450)  128MB RAM
Compaq Smart Array 2/SL
5 x 4.3 GB Ultra2 SCSI HD (Both RAID 0+1 and RAID 5 have been tried)

Server 3:
Compaq Proliant 2500 (PPro 200) 256MB RAM
Compaq Smart Array 2/DH
3 x 4.3 GB Ultra2 SCSI HD (RAID 1 with hot spare)

I have tried these servers with 
	-1.6.1 and current.  
	-With Array acceleration enabled and disabled

Ten-finger copy of trace and PS output after one crash:

panic: biodone already
Stopped at    cpu_Debugger+0x4:		leave
db> trace
cpu_Debugger(c4aa9488,6,ca9a2e40,c017b4f6,c3aa9488) at cpu_Debugger+0x4
panic(c0546622,c0a26200,c0a262b0,c0793ddc,c3aa9488) at panic+0xb8
biodone(c3aa9488,2000,100000,c0793ddc,c0a26200) at biodone+0x35
ddoneac3aa9488,c0793e08,c01b2c9c,c3aa9488) at lddone+0x05
ld_cac_done(c0a26200,c3aa948,8,0,c01b2b2a,c09dda00) at ld_cac_done+0xc5
cac_ccb_done(c09dda00,ca9a2e40,c0793e68,0,c0a23e40) at cac_ccb_done+0x9f
cac_intr(c09dda00,0,c0790010,30,c0100010) at cac_intr+0x2a
Xintr_legacy10() at Xintr_legacy10+0xa8
--- interrupt ---
mpidle(c06d9560,0,c0793f6c,0,80000000) at mpidle
ltsleep(c06d93a0,4,c054de46,0,0) at ltsleep+0x207
gvm_scheduler(c078f010,78f000,798000,0,0) at gvm_scheduler+0x75
main(0,0,0,0,0) at main+0x69e
db> ps
PID	PPID	PGRP	UID	S	FLAGS	LWPS	COMMAND		WAIT
447	446	446	0	2	0x4002	1	gzip		pipdwt
446	364	446	0	2	0x4002	1	tar		biowait
380	409	409	0	2	0x4002	1	gzip		pipdwt
409	342	409	0	2	0x4002	1	tar		biowait
405	368	405	0	2	0x4002	1	rm		biowait
375	1	375	0	2	0x4002	1	getty		ttyin
364	1	364	0	2	0x4003	1	csh		pause
342	1	342	0	2	0x4003	1	csh		pause
368	1	368	0	2	0x4003	1	csh		pause
344	1	344	0	2	0	1	cron		nanosic
334	1	334	0	2	0	1	inetd		kqread
175	1	175	0	2	0	1	syslogd		biowait
125	1	125	0	2	0	1	dhclient	select
10	0	0	0	2	0x20200	1	aiodoned	aiodone
9	0	0	0	2	0x20200	1	ioflush
8	0	0	0	2	0x20200	1	reaper		reaper
7	0	0	0	2	0x20200	1	pagedaemon	pgdaemo
6	0	0	0	2	0x20200	1	ifs_writer	ifswrit
5	0	0	0	2	0x20200	1	pms0		pmsrese
4	0	0	0	2	0x20200	1	atapibus0	sccomp
3	0	0	0	2	0x20200	1	scsibus1	sccomp
1	0	1	0	2	0x4000	1	init		wait
0	-1	0	0	2	0x20200	1	swapper 	schedule
db>


>How-To-Repeat:
	Heavy disk write activity on a system using a Compaq Smart Array 2 SL,
	DH, or 3200 seems to be all that is necessary to induce the kernel 
	panic.
>Fix:
	No known work-around.
>Release-Note:
>Audit-Trail:
>Unformatted: