Subject: Re: Completely useless report on disk lockups
To: Joseph A. Dacuma <jadacuma@ched.gov.ph>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 01/18/2007 21:38:52
On Thu, Jan 18, 2007 at 07:04:28AM +0800, Joseph A. Dacuma wrote:
> Hi Fujinaka San!
> 
> > Any hints on how to debug this would be appreciated, but I'm getting
> > fairly repeatable disk lockups with i386-current. With all the thread
> > changes I don't doubt that something odd is happening, but this is on
> > fairly new hardware that I'm not quite sure is working completely right.
> >
> > Any hints on how to debug this would be appreciated. Sometimes it drops
> > into the debugger, sometimes it just plain hangs. The kernel I built
> > yesterday seems fine.
> >
> 
> I also get a similar of error however on stable branch. This is also an
> SMP machine:
> 
> NetBSD 3.1_STABLE (config_orange1) #0: Tue Jan  9 17:01:46 UTC 2007
>         root@tange.yagitnet.org:/usr/obj/sys/arch/i386/compile/config_orange1
> total memory = 511 MB
> avail memory = 496 MB
> BIOS32 rev. 0 found at 0xfdb50
> mainbus0 (root)
> mainbus0: Intel MP Specification (Version 1.1) (INTEL    440GX       )
> 
> ----snip----
> 
> Jan 16 03:52:50 tange /netbsd: sd1(ahc0:0:1:0):  Check Condition on CDB:
> 0x2a 00
>  01 9f 0d 5f 00 00 20 00
> Jan 16 03:52:50 tange /netbsd: SENSE KEY:  Aborted Command
> Jan 16 03:52:50 tange /netbsd: ASC/ASCQ:  SCSI Parity Error
> Jan 16 03:52:50 tange /netbsd: FRU CODE:  0x8
> Jan 16 03:52:50 tange /netbsd:
> Jan 16 03:52:50 tange /netbsd: sd1(ahc0:0:1:0):  Check Condition on CDB:
> 0x2a 00
>  01 f4 3b cd 00 00 20 00
> Jan 16 03:52:50 tange /netbsd: SENSE KEY:  Aborted Command
> 
> --snip--
> 
> I ran a Seagate disk utility and all tests were OK. All I remember was
> when the machine was compiling Seamonkey and running build.sh (two
> instances). It just paused then I saw on the first terminal that I went
> onto the debugger mode.The sad thing is I dont know how to recreate the
> same error again.

This is most probably a hardware issue on your SCSI chain at the electrical
level. Heavy disk usage stress the bus and cause this error. Check
terminators and connectors.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--