Re: Aborted Command, ahd_timeout, panic: no dir

To: Edgar Fuß <ef%math.uni-bonn.de@localhost>, tech-kern%NetBSD.org@localhost
Subject: Re: Aborted Command, ahd_timeout, panic: no dir
From: Brian Buhrow <buhrow%nfbcal.org@localhost>
Date: Tue, 27 Nov 2012 04:00:52 -0800

        hello.   I think you might have been bitten by two different issues at
the same time.  The first was a hickup from the afflicted disk.  The second
was that the disk, while failing, monopolized the scsi bus so badly that
the raid driver couldn't get to the other disk in order to do the write.
The ahd(4) driver should be able to recover from this, but I've just spent
all night working on a similar failure mode with the mpt(4) driver, which
currently deals very badly with such events.  If the ahd(4) driver is  at
all similar to the mpt(4) driver in terms ofhow it deals with disk timeout
handling, then I'm pretty sure this is what's going on.  If the event 
recurs, then I'd say you have a bad disk, and the firmware on the 
disk is not handling the errors gracefully in terms of what it's doing 
with the scsi bus.^
On Nov 27, 12:15pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Re: Aborted Command, ahd_timeout, panic: no dir
} > So the HBA was fully operational, but the disk didn't reset fully?
} Looks like it.
} 
} > Assuming the SCSI BUS reset code is in ahd is ok
} I even pressed the red button (because I tried reboot in ddb and it got stuck
} in "syncing disks...".
} 
} > - cable/connector not fully connected, broken wire/half-broken pin or
} > similar (ahd does SCSI over parallel SCSI cable, right?)
} I didn't touch the cabling.
} 
} > In the latter case, I'd consider the power supply capacity, among 
} > other sources of the problem. Hm, semi-brownout?
} The power supply has far more capacity than needed (and there are two of it).
} And everything is behind USVen (one per PSU).
} 
} But before that, there was that "tagged overlapped commands" error.
} 
} Also, with one of two disks of a level 1 RAID failing, I wouldn't expect 
} anything serious to happen.
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=

References:
- Re: Aborted Command, ahd_timeout, panic: no dir
  - From: Edgar Fuß

Prev by Date: Re: Aborted Command, ahd_timeout, panic: no dir
Next by Date: Re: tstile lockup
Previous by Thread: Re: Aborted Command, ahd_timeout, panic: no dir
Next by Thread: Making forced unmounts work
Indexes:

Home | Main Index | Thread Index | Old Index