Subject: Re: Completely useless report on disk lockups
To: Joseph A. Dacuma <jadacuma@ched.gov.ph>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 01/18/2007 21:38:52
On Thu, Jan 18, 2007 at 07:04:28AM +0800, Joseph A. Dacuma wrote:
> Hi Fujinaka San!
>
> > Any hints on how to debug this would be appreciated, but I'm getting
> > fairly repeatable disk lockups with i386-current. With all the thread
> > changes I don't doubt that something odd is happening, but this is on
> > fairly new hardware that I'm not quite sure is working completely right.
> >
> > Any hints on how to debug this would be appreciated. Sometimes it drops
> > into the debugger, sometimes it just plain hangs. The kernel I built
> > yesterday seems fine.
> >
>
> I also get a similar of error however on stable branch. This is also an
> SMP machine:
>
> NetBSD 3.1_STABLE (config_orange1) #0: Tue Jan 9 17:01:46 UTC 2007
> root@tange.yagitnet.org:/usr/obj/sys/arch/i386/compile/config_orange1
> total memory = 511 MB
> avail memory = 496 MB
> BIOS32 rev. 0 found at 0xfdb50
> mainbus0 (root)
> mainbus0: Intel MP Specification (Version 1.1) (INTEL 440GX )
>
> ----snip----
>
> Jan 16 03:52:50 tange /netbsd: sd1(ahc0:0:1:0): Check Condition on CDB:
> 0x2a 00
> 01 9f 0d 5f 00 00 20 00
> Jan 16 03:52:50 tange /netbsd: SENSE KEY: Aborted Command
> Jan 16 03:52:50 tange /netbsd: ASC/ASCQ: SCSI Parity Error
> Jan 16 03:52:50 tange /netbsd: FRU CODE: 0x8
> Jan 16 03:52:50 tange /netbsd:
> Jan 16 03:52:50 tange /netbsd: sd1(ahc0:0:1:0): Check Condition on CDB:
> 0x2a 00
> 01 f4 3b cd 00 00 20 00
> Jan 16 03:52:50 tange /netbsd: SENSE KEY: Aborted Command
>
> --snip--
>
> I ran a Seagate disk utility and all tests were OK. All I remember was
> when the machine was compiling Seamonkey and running build.sh (two
> instances). It just paused then I saw on the first terminal that I went
> onto the debugger mode.The sad thing is I dont know how to recreate the
> same error again.
This is most probably a hardware issue on your SCSI chain at the electrical
level. Heavy disk usage stress the bus and cause this error. Check
terminators and connectors.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--