port-pmax: NetBSD 1.4.2: asc_startcmd: reselect failed to interrupt? (retry)

Subject: NetBSD 1.4.2: asc_startcmd: reselect failed to interrupt? (retry)
To: None <port-pmax@netbsd.org>
From: Michael Olbricht <s_olbric@ira.uka.de>
List: port-pmax
Date: 07/01/2000 21:35:15
[ Sorry for the empty message, something went wrong with saving the text.
<teeth marks on keyboard>

one more try: ]

Hello,

I have installed NetBSD 1.4.2-release on two DECstations 5000/120.
Both of them have several SCSI hard drives (SCSI-2 and Fast-SCSI-2).
One of them has a QIC tape drive, the other one a CD ROM drive.

On both systems, sometimes the message 
"asc_startcmd: reselect failed to interrupt?" appears.
This seems to happen only when the machine has to handle heavy
load, like compiling, proxy cache garbage collection, /etc/daily.

Both machines had been running NetBSD 1.3.2 before without any
SCSI problems, and no hardware changes have happened for months.
The NetBSD upgrade was done by a clean install into empty filesystems
on different disks.

The output of dmesg for one machine:

<snip>

NetBSD 1.4.2 (GENERIC) #53: Wed Mar 15 10:14:32 EST 2000
    root@vlad:/usr/src/sys/arch/pmax/compile/GENERIC
DECstation 5000/120 (3MIN)
real mem  = 33554432
avail mem = 27136000
using 819 buffers containing 3354624 bytes of memory
mainbus0 (root)
cpu0 at mainbus0: cpu0: MIPS R3000 CPU Rev. 3.0 with MIPS R3010
FPC Rev. 3.0
cpu0: 64kb Instruction, 64kb Data, direct mapped cache
tc0 at mainbus0: 12.5 MHz clock
ioasic0 at tc0 slot 3 offset 0x0
le0 at ioasic0 offset 0xc0000: address 08:00:2b:27:97:fc
le0: 32 receive buffers, 8 transmit buffers
scc0 at ioasic0 offset 0x100000
scc1 at ioasic0 offset 0x180000
mcclock0 at ioasic0 offset 0x200000: mc146818 or compatible
asc0 at ioasic0 offset 0x300000: target 7
le1 at tc0 slot 1 offset 0x0: address 08:00:2b:23:6c:1e
le1: 32 receive buffers, 8 transmit buffers
cfb0 at tc0 slot 0 offset 0x0: (1024x864x8) (console)
Beginning old-style SCSI device autoconfiguration
rz0 at asc0 drive 0 slave 0 SEAGATE ST43400N rev 1028
rz0: 2777MB, 2737 cyl, 21 head, 98 sec, 512 bytes/sect x 5688447
sectors
rz1 at asc0 drive 1 slave 0 IBM DPES-31080 rev S31K
rz1: 1034MB, 4903 cyl, 4 head, 108 sec, 512 bytes/sect x 2118144
sectors
rz3 at asc0 drive 3 slave 0 SEAGATE ST43400N rev 1028
rz3: 2777MB, 2737 cyl, 21 head, 98 sec, 512 bytes/sect x 5688447
sectors
rz4 at asc0 drive 4 slave 0 DEC RZ24     (C) DEC rev 221B
rz4: 200MB, 1348 cyl, 8 head, 38 sec, 512 bytes/sect x 409792
sectors
tz0 at asc0 drive 5 slave 0 TANDBERG  TDC 3800 rev =03:
boot device: rz4
root on rz4a dumps on rz4b
root file system type: ffs
rz3: Recoverable error, blk 1669581
asc_startcmd: reselect failed to interrupt?
asc_startcmd: reselect failed to interrupt?
asc_startcmd: reselect failed to interrupt?
asc_startcmd: reselect failed to interrupt?
asc_startcmd: reselect failed to interrupt?
asc_startcmd: reselect failed to interrupt?

<snip>

rz4: boot, root and swap
rz1: /usr and swap
rz0 and rz3: mostly /var
(both to be phased out: too many grown defects :-( )

BTW: the tape drive seems to have become very slow compared to
NetBSD 1.3.2 (when used with dd or tar on /dev/rmth0)
BTW2: I could not make NetBSD boot from the IBM disk (rz1)
(messages: "open failed"
           "open 3/rz1/boot: 6"
           "can't load 'boot'" -
 yes, I had put the bootblocks on it),
but Ultrix had problems with this disk as well.

asc.c contains the lines:

if (regs->asc_status == SCSI_PHASE_MSG_IN) {
   printf("asc_startcmd: reselect failed to interrupt?\n");
   /* XXXX THIS NEEDS FIXING */
}

:-)

What I would like to know is:

- can data loss/corruption occur due to this problem?
- can this be caused by hardware problems (cabling, termination)?

I could not yet try a newer kernel, but since I could not find
this problem mentioned in the mailing list archive, I assume
it has not been fixed.

Besides that, NetBSD-1.4.2 seems to be very stable on the pmaxen,
but the uptime values are low: too many power outages. ;-)

Bye
Michael Olbricht