Subject: Re: sudden instability on 1.6.1
To: None <netbsd-users@NetBSD.org>
From: Steven M. Bellovin <smb@research.att.com>
List: netbsd-users
Date: 07/30/2003 14:45:59
In message <HIr3Dw.JKL@tac.nyc.ny.us>, Christos Zoulas writes:
>In article <20030728170854.040F37C92@berkshire.research.att.com>,
>Steve Bellovin <smb@research.att.com> wrote:
>>A 1.6.1 machine of mine has suddenly started crashing, for no apparent
>>reason. For the last crash, I deliberately left it not running X, so
>>I could see any messages:
>>
>>/tmp: got error 5 while accessing file system
>>panic: softdep_deallocate_dependencies: unrecovered I/O error
>
>Hmm, 5 = EIO, I see a few places in sd.c where EIO is returned but
>does not make a lot of sense to me. I'd add some printf's and see
>which one is causing it.
>
I will add some printfs. For now, I made /tmp an MFS file system and
turned off softdep on the other partition on the drive. When I did
that, I got the following in /var/log/messages when I tried listing the
root directory of that file system:
Jul 30 14:30:56 sigaba /netbsd: sd1(ahc1:0:1:0): SCB 14 - timed out while idle, SEQADDR == 0xa
Jul 30 14:30:57 sigaba /netbsd: SCSIRATE == 0x0
Jul 30 14:30:57 sigaba /netbsd: sd1(ahc1:0:1:0): Queuing a BDR SCB
Jul 30 14:30:57 sigaba /netbsd: sd1(ahc1:0:1:0): no longer in timeout, status = 0
'ls' said 'Input/output error'.
It's clear that I have a hardware problem, though whether it's the
drive, the controller, or the cable is still unclear to me. There are
some NetBSD issues, too, such as why I didn't see any kernel error
messages when I had softdep enabled, or why the system panicked when
the softdep layer received EIO from the driver. That latter is
unacceptable, I think; it's reminiscent of 6th Edition Unix and
earlier, where you were told to buy error-free disk packs because the
drivers couldn't recover....
--Steve Bellovin, http://www.research.att.com/~smb