Subject: kern/9811: adw(4) hang problem in -current?(i386)
To: None <gnats-bugs@gnats.netbsd.org>
From: None <smd@ebone.net>
List: netbsd-bugs
Date: 04/06/2000 09:32:10
>Number: 9811
>Category: kern
>Synopsis: disk accesses timeout and never recover
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Apr 06 02:55:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator: Sean Doran
>Release: current as of 1 Apr
>Organization:
>Environment:
System: NetBSD crasse.smd.ebone.net 1.4X NetBSD 1.4X (SCREAM) #0: Sat Apr 1 02:01:29 CEST 2000 smd@crasse.smd.ebone.net:/usr/src/sys/arch/i386/compile/SCREAM i386
adw0 at pci0 dev 19 function 0: AdvanSys ASB-3940U2W SCSI adapter
adw0: interrupting at irq 10
scsibus2 at adw0: 16 targets, 8 luns per target
...
scsibus2: waiting 2 seconds for devices to settle...
sd3 at scsibus2 target 1 lun 0: <IBM, DGHS18D, 03E0> SCSI3 0/direct fixed
sd3: 17501 MB, 8154 cyl, 20 head, 219 sec, 512 bytes/sect x 35843670 sectors
sd4 at scsibus2 target 6 lun 0: <QUANTUM, QM318000TD-SW, N491> SCSI2 0/direct fi
xed
sd4: 17366 MB, 8057 cyl, 20 head, 220 sec, 512 bytes/sect x 35566500 sectors
>Description:
sd3(adw0:1:0): timed out
sd3(adw0:1:0): timed out
sd3(adw0:1:0): timed out
and that's all she wrote
all accesses to the timed-out disk (sd3 and sometimes sd4) simply
block, as seen below, after dropping, issuing kill 0t1 and c.
scsictl has no effect.
cycling power (unplug sca<->lvd 68pin converter, wait, replug) on disk
does nothing or triggers:
sd3: respinning up disk
sd3(adw0:1:0): timed out
Occasionally the disk will hang with the disk busy LED in on state;
usually not.
The hangs occur most often after the machine has been up some hours,
and so far never under particularly heavy load.
envstat shows nothing unusual thermally, and a hit of the reset
button or a quick power cycle of the entire machine will always
result in a perfectly happy controller/disk combination, for many hours.
I have done nothing unusual configuration-wise to the adaptor card,
and since the disks run normally for long periods of time under mixed
loads, I find it hard to think of how to blame hardware.
Unfortunately, this gives me maximal uptimes around 30 hours,
since I cannot recover from the timed out disk without a(n unclean) shutdown.
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 0 0 0 -18 0 0 15516 schedu DLs ?? 0:00.03 (swapper)
0 1 0 0 10 0 276 240 wait Is ?? 0:00.01 init
0 2 0 0 10 0 0 15516 apmev DL ?? 0:00.57 (apm0)
0 3 0 0 -18 0 0 15516 daemon DL ?? 0:00.00 (pagedaemon)
0 4 0 0 -18 0 0 15516 reaper DL ?? 0:00.11 (reaper)
0 5 0 0 18 0 0 15516 syncer DL ?? 0:01.79 (ioflush)
0 196 1 0 -2 0 256 608 vnlock Ds ?? 0:00.22 /usr/pkg/sbin
3005 3267 1 0 -2 0 1760 2540 vnlock D p0- 0:00.00 /usr/X11R6/bi
3005 282 1 0 -2 0 15328 14164 vnlock D p3- 1:47.11 /usr/pkg/lib/
3005 283 282 0 2 0 0 0 - Z p3- 0:00.00 (netscape)
3005 483 1 30 -5 0 16852 17156 scsipi D p3- 333:19.41 /usr/pkg/lib/
3005 484 483 0 2 0 0 0 - Z p3- 0:00.00 (netscape)
3005 611 1 0 -2 0 12456 11620 vnlock D p3- 0:17.51 /usr/pkg/lib/
3005 612 611 0 2 0 0 0 - Z p3- 0:00.00 (netscape)
3005 1909 1 0 -5 0 19568 20752 biowai D p3- 0:19.69 /usr/pkg/lib/
3005 1910 1909 0 2 0 0 0 - Z p3- 0:00.00 (netscape)
0 3266 1 0 -2 0 364 240 vnlock D p5- 0:00.01 -csh
3005 757 1 0 -5 0 528 296 biowai Ds+ p7 0:00.03 es
3005 1081 757 29 -2 4 22584 23044 vnlock DNE p7 303:54.65 /usr/pkg/lib/
3005 1082 1081 0 28 0 0 0 - Z p7 0:00.00 (netscape)
3005 1567 757 3 31 0 0 0 - Z p7 0:00.00 (netscape)
0 206 1 5 -2 0 632 276 vnlock D E0- 68:34.98 ./rc5des
0 3268 1 0 -2 0 24 104 vnlock D E0- 0:00.00 /usr/bin/su
0 3269 1 0 10 0 396 192 wait Ss E0 0:00.00 /bin/sh
0 3271 3269 0 28 0 312 188 - R+ E0 0:00.00 ps -axl
>How-To-Repeat:
boot
run normally
one of the disks hanging off the adw(4) controller times out
try to access it
see process hang
particularly fun when the disk that times out has /usr/pkg
and /usr/pkg/sbin is touched by root's csh startup rehash... -:(
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted: