Subject: Re: ESP SCSI controller errors?
To: None <eeh@netbsd.org>
From: Havard Eidnes <he@netbsd.org>
List: port-sparc
Date: 01/27/2002 20:45:48
> Given that, it sounds like this is "drive rot" rather than "driver ro=
t",
> but it would be much more interesting to learn what status the drive
> is returning in this circumstance.  It may be possible to work around=

> it in the driver.

Hm, OK.  We may have to put the "drive upgrade" plan into action.
However, if you want to have us test a driver tweak to get something
more intelligible out of the driver I'd be interested in doing that.

> So exactly how long ago did this problem start?  And what's the date
> of the kernel sources?

The date of the kernel sources we run at the moment is January 17 2002.=


As to when it started occurring, it's a little more difficult to tell.
Our rtty setup only keeps a month's worth of console logs.  The first
entry I find is from Jan 8, which says:

esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 1, resid 3c=
00
esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 1, resid 40=
0
trap type 0x7: pc=3D0xf01b3484 npc=3D0xf01b3488 psr=3D110000c0<S,PS>
kernel: alignment fault trap
Stopped in pid 17254 (find) at  ufs_lookup+0x2dc:       lduh           =
 [%l0 + 0x4], %o2
db>

At the time it was running a kernel from Jan 6 sources.

On a later boot, it said:

esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 1, resid 20=
00
bad block -939524096, ino 384384
bad block -2013259840, ino 384384
Jan  9 04:03:04 anker /netbsd: uid 16777216 comm find on /home: bad blo=
ck
bad block -922746880, ino 384385
bad block 1744836527, ino 384385
bad block -905969664, ino 384386
bad block 1073747887, ino 384386
Jan  9 04:03:35 anker last message repeated 4 times
bad block -889192448, ino 384387
bad block -1879042129, ino 384387
bad block -872415232, ino 384388
bad block -1342171217, ino 384388
bad block -855638016, ino 384389
bad block 134223791, ino 384389
Jan  9 04:04:08 anker last message repeated 7 times

(didn't crash on that run), and on yet a later boot it said

esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 1, resid 40=
0
trap type 0x7: pc=3D0xf01b3484 npc=3D0xf01b3488 psr=3D110000c1<S,PS>
kernel: alignment fault trap
Stopped in pid 2326 (find) at   ufs_lookup+0x2dc:       lduh           =
 [%l0 + =

0x4], %o2
db>

It was after the latter I ended up with

PARTIALLY ALLOCATED INODE I=3D384390
/dev/rsd1a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.

and a number of files and directories in lost+found.

Regards,

- H=E5vard