Subject: Re: Hang (SCSI-related?) on 1.3.2
To: Gunnar Helliesen <gunnar@bitcon.no>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-i386
Date: 01/06/2000 16:45:26
On Thu, Jan 06, 2000 at 03:35:44PM +0100, Gunnar Helliesen wrote:
> I run NetBSD/i386 1.3.2 on a 400 MHz P-II with 512 MB ECC RAM and an
> Adaptec 2940UW with two 9 GB Quantum Viking disks. dmesg(8) output
> included at the end of this message.
> 
> The system has been running stably for over a year. Beginning a few weeks
> ago the machine started sometimes rebooting itself and sometimes hanging,
> in the latter case needing a toggle of the reset switch. The spontaneous 
> reboots are fairly rare, most often the machine hangs (about once every
> two or three days).
> 
> When the machine hangs, the kernel writes these messages on the console,
> about two messages/second or so:
> 
> 
> ahc0: target 0 synchronous at 20.0MHz, offset = 0x8
> ahc0: target 0 synchronous at 20.0MHz, offset = 0x8
> ahc0: target 0 synchronous at 20.0MHz, offset = 0x8
> ... and so on, for ever...

I think this comes after target 0 failed, the driver resetted it.

> 
> 
> One one occasion a few days ago it said "target 1", but on most hangs it
> writes the above message about "target 0". Targets 0 and 1 are identical
> Quantum disks. The machine also has a CD-ROM at target 2 and a DLT tape
> at target 3, but neither have been mentioned in the kernel messages on the
> console during a hang.
> 
> Whether the machine spontaneously reboots or hangs there are no traces 
> anywhere of what caused the problem. No crashdump and no entries in
> syslog.
> 
> When I hit the reset switch the machine boots without problems. No disk
> errors, no SCSI bus resets, no retries, nothing. The machine just boots as
> if nothing had happened. I would have figured that if this was a hardware
> problem that the problem would still be there when the machine
> rebooted? (Especially since I never cycle the power). If it's a software
> problem (ahc driver) why has the machine been running solid as a rock for
> over a year?
> 
> Ideas anyone? Where do I start? Replace the controller?

First check cables and terminations.
Then try removing the CD or DLT, if possible. Maybe it's one of the disks
which is becoming bad ?

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--