Subject: Re: SBC probs
To: Scott Reynolds <scottr@edsi.org>
From: Hauke Fath <saw@sun0.urz.uni-heidelberg.de>
List: port-mac68k
Date: 09/06/1996 22:15:33
At 1:30 Uhr 03.09.1996, Scott Reynolds wrote:
>On Fri, 30 Aug 1996, The Great Mr. Kurtz [David A. Gatwood] wrote:
>
>> To throw in another POV, I've had similar problems with my brand-spanking-
>> new (Quantum) internal drive on my PB145, both with NCRSCSI and sbc,
>> but the drive checks out fine.
>
>I think it's crucial to point out that the disk corruption that Chris is
>talking about is "physical," i.e. medium errors.  This is not the same as
>data corruption, which a couple people have reported.  The latter is why I
>even started working on the sbc SCSI driver in the first place (and is
>something I hope to continue soon).
>
>--scott


Hi,

just to add another data point to the collection: here's what I've seen
some minutes ago.

First this (I sup'ed -current userland, sup only updated stuff, thus not
much traffic on ppp0)...


## 6 Sep, 21h35
#
# -current sbc kernel of 25 Jun, splimp() = 4
# NetBSD 1.2_ALPHA (EIBE) #5: Tue Jun 25 19:11:57 GMT 1996
# booted to serial console
#
# Action: sup userland sources

sbc0: can not transfer more data
sbc0: aborting, but phase=DATA_OUT (reset)
sbc0: reset SCSI bus for TID=2 LUN=0
panic: ncr5380_scsi_cmd: polled request, abort failed
Stopped at      _Debugger+0x6:  unlk    a6
db> t
_Debugger(19600,7ab8,8c7c58,1013,8c7c70) + 6
_panic(7ab8,0,7,2,3) + 34
_ncr5380_scsi_cmd(6c06f80) + 80
_scsi_done(6c06f80,79d0,6c048ac,6c04800,6c048ac) + 6a
_ncr5380_scsi_cmd(6c04800) + 352
_ncr5380_scsi_cmd(6c04800) + 1950
_ncr5380_scsi_cmd(6c04800) + 650
_ncr5380_scsi_cmd(6c06f80) + 110
_scsi_execute_xs(6c06f80) + 28
_scsi_scsi_cmd(6c05de0,8c7d8c,6,6e7d000,2000,4,2710,761124,1001) + 8a
_sdstart(6c02200,6c022a6,761124) + 1ca
_sdstrategy(761124,8c7df0,283e2,8c7de8,0) + b4
_spec_strategy(8c7de8) + 24
_bwrite(761124,6c28b00,1,1,0) + d8
_ffs_update(8c7e44) + 182
_ufs_setattr(8c7e7c) + 234
_sys_utimes(6c28b00,8c7f88,8c7f80) + 162
_syscall(8a) + 10a
_trap0() + e
db>


And then, after rebooting...


root:~ # fsck /dev/rsd2g
sd2(sbc0:2:0): illegal request, data = b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db
 6d b6 db 6d b6 db 1a 00 04 00
sd2: could not mode sense (4); using fictitious geometry
Sep  6 21:54:25 se30 /netbsd: sd2(sbc0:2:0): illegal request, data = b6 db 6d b6
 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 1a 00 04 00
Sep  6 21:54:25 se30 /netbsd: sd2(sbc0:2:0): illegal request, data = b6 db 6d b6
 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 1a 00 04 00
Sep  6 21:54:25 se30 /netbsd: sd2: could not mode sense (4); using fictitious ge
ometry
Sep  6 21:54:25 se30 /netbsd: sd2: could not mode sense (4); using fictitious ge
ometry
** /dev/rsd2g
** Last Mounted on /usr/src
** Phase 1 - Check Blocks and Sizes
Sep  6 21:54:36 se30 named[85]: zoneref: Masters for secondary zone "206.129.in-
addr.arpa" unreachable
sbc0: pdma_in: timeout len=8192 count=8192
sd2(sbc0:2:0): medium error, info = 426678 (decimal), data = b6 db 6d b6 db 6d b
6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 08 06 82 b5
Sep  6 21:54:50 se30 /netbsd: sbc0: pdma_in: timeout len=8192 count=8192
Sep  6 21:54:50 se30 /netbsd: sbc0: pdma_in: timeout len=8192 count=8192
Sep  6 21:54:50 se30 /netbsd: sd2(sbc0:2:0): medium error, info = 426678 (decima
l), data = b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 08 06 82
b5
Sep  6 21:54:50 se30 /netbsd: sd2(sbc0:2:0): medium error, info = 426678 (decima
l), data = b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 08 06 82
b5

CANNOT READ: BLK 426464
CONTINUE? [yn] y

sd2(sbc0:2:0): medium error, info = 426678 (decimal), data = b6 db 6d b6 db 6d b
6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 08 06 82 b6
Sep  6 21:55:14 se30 /netbsd: sd2(sbc0:2:0): medium error, info = 426678 (decima
l), data = b6 db 6d sd2(sbc0:2:0): medium error, info = 426679 (decimal), data =
 b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 6d b6 db 08 06 82 b7

[continuing for a long time...]


This has happened here for the third time with MOs now. Every time there
was a SCSI related crash before the 'medium error' messages, every time
this crash had happened during a write access.

The MO is mounted read-only during kernel & userland builds: I have yet to
see media errors under those circumstances.

I still don't see a way for a piece of software to corrupt the media of a
SCSI device -- but I can't help suspecting there must be...

Uh oh.



	hauke

---
"It's never straight up and down"     (DEVO)