Subject: Compaq "Smart" Array controllers
To: None <netbsd-users@netbsd.org>
From: None <netbsd-ml@amalgam.dyndns.org>
List: netbsd-users
Date: 06/02/2003 13:01:34
My apologies for the length, but I wanted to be thorough in my report.
A report of this problem was originally placed on current-users, but
I have since discovered that it is not just one machine, and that it
happens on both the 1.6.1 release, as well as current.
I have been getting the following errors under even moderately heavy
loads on two servers I built recently:
ld0f: error writing fsbn 90048 of 90048-90063 (ld0 bn 5597760; cn 1388 tn 21 sn 21)
ld0f: error writing fsbn 90048 of 90048-90063 (ld0 bn 5597760; cn 1388 tn 21 sn 21)
ld0: dk_busy < 0
panic: disk_unbusy
stopped in pid 227 (tar) at cpu_debugger+0x4: leave
stopped in pid 227 (tar) at cpu_debugger+0x5: ret
stopped in pid 227 (tar) at panic+0xad: jmp panic+0x118
stopped in pid 227 (tar) at panic+0x118: addl $-0x8,%esp
stopped in pid 227 (tar) at panic+0x11b: pushl $0
obviously the addresses and block numbers change each time, but this is
pretty much the signature of the crash.
The other panic I have seen in the same conditions is:
panic: biodone already
Hardware
Server 1:
Compaq Proliant 1850R (PIII 600) 128MB RAM
Compaq Smart Array 3200
4 x 9.1 GB Ultra2 SCSI HD (Tried RAID 0+1 and also RAID 5)
Server 2:
Compaq Proliant 1600 (PII 450) 128MB RAM
Compaq Smart Array 2/SL
5 x 4.3 GB Ultra2 SCSI HD (Both RAID 0+1 and RAID 5 have been tried)
I have tried these servers with
-1.6.1 and current.
-With Array acceleration enabled and disabled.
But, seemingly regardless of what I try, under any moderate disk activity
the above errors pop up, and the server folds.
I have one other server, a Proliant 2500 + Smart 2/DH, that has not
had any problems since installation last week, so I do not think it
is my installation approach[2], but I am open to any suggestions.
So far, my searching has only turned up a similar problem [1] with
a mylix RAID card. But the cause of that problem is supposed to be
in the mlx.c driver not the cac driver I am using.
Can anyone offer any enlightenment? Is this my mistake, or a bug?
Trace and PS info are attached below,
Michael
[1] http://mail-index.netbsd.org/current-users/2003/05/03/0003.html
[2] To be completely fair, this server does not see much in the way of
disk activity, so it could conceivably have the same problem only it
has not been in a situation to be affected by it yet.
Trace and PS output after crash:
panic: biodone already
Stopped at cpu_Debugger+0x4: leave
db> trace
cpu_Debugger(c4aa9488,6,ca9a2e40,c017b4f6,c3aa9488) at cpu_Debugger+0x4
panic(c0546622,c0a26200,c0a262b0,c0793ddc,c3aa9488) at panic+0xb8
biodone(c3aa9488,2000,100000,c0793ddc,c0a26200) at biodone+0x35
ddoneac3aa9488,c0793e08,c01b2c9c,c3aa9488) at lddone+0x05
ld_cac_done(c0a26200,c3aa948,8,0,c01b2b2a,c09dda00) at ld_cac_done+0xc5
cac_ccb_done(c09dda00,ca9a2e40,c0793e68,0,c0a23e40) at cac_ccb_done+0x9f
cac_intr(c09dda00,0,c0790010,30,c0100010) at cac_intr+0x2a
Xintr_legacy10() at Xintr_legacy10+0xa8
--- interrupt ---
mpidle(c06d9560,0,c0793f6c,0,80000000) at mpidle
ltsleep(c06d93a0,4,c054de46,0,0) at ltsleep+0x207
gvm_scheduler(c078f010,78f000,798000,0,0) at gvm_scheduler+0x75
main(0,0,0,0,0) at main+0x69e
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
447 446 446 0 2 0x4002 1 gzip pipdwt
446 364 446 0 2 0x4002 1 tar biowait
380 409 409 0 2 0x4002 1 gzip pipdwt
409 342 409 0 2 0x4002 1 tar biowait
405 368 405 0 2 0x4002 1 rm biowait
375 1 375 0 2 0x4002 1 getty ttyin
364 1 364 0 2 0x4003 1 csh pause
342 1 342 0 2 0x4003 1 csh pause
368 1 368 0 2 0x4003 1 csh pause
344 1 344 0 2 0 1 cron nanosic
334 1 334 0 2 0 1 inetd kqread
175 1 175 0 2 0 1 syslogd biowait
125 1 125 0 2 0 1 dhclient select
10 0 0 0 2 0x20200 1 aiodoned aiodone
9 0 0 0 2 0x20200 1 ioflush
8 0 0 0 2 0x20200 1 reaper reaper
7 0 0 0 2 0x20200 1 pagedaemon pgdaemo
6 0 0 0 2 0x20200 1 ifs_writer ifswrit
5 0 0 0 2 0x20200 1 pms0 pmsrese
4 0 0 0 2 0x20200 1 atapibus0 sccomp
3 0 0 0 2 0x20200 1 scsibus1 sccomp
1 0 1 0 2 0x4000 1 init wait
0 -1 0 0 2 0x20200 1 swapper schedule
db>