[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Aborted Command, ahd_timeout, panic: no dir
hello. I think you might have been bitten by two different issues at
the same time. The first was a hickup from the afflicted disk. The second
was that the disk, while failing, monopolized the scsi bus so badly that
the raid driver couldn't get to the other disk in order to do the write.
The ahd(4) driver should be able to recover from this, but I've just spent
all night working on a similar failure mode with the mpt(4) driver, which
currently deals very badly with such events. If the ahd(4) driver is at
all similar to the mpt(4) driver in terms ofhow it deals with disk timeout
handling, then I'm pretty sure this is what's going on. If the event
recurs, then I'd say you have a bad disk, and the firmware on the
disk is not handling the errors gracefully in terms of what it's doing
with the scsi bus.^
On Nov 27, 12:15pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Re: Aborted Command, ahd_timeout, panic: no dir
} > So the HBA was fully operational, but the disk didn't reset fully?
} Looks like it.
} > Assuming the SCSI BUS reset code is in ahd is ok
} I even pressed the red button (because I tried reboot in ddb and it got stuck
} in "syncing disks...".
} > - cable/connector not fully connected, broken wire/half-broken pin or
} > similar (ahd does SCSI over parallel SCSI cable, right?)
} I didn't touch the cabling.
} > In the latter case, I'd consider the power supply capacity, among
} > other sources of the problem. Hm, semi-brownout?
} The power supply has far more capacity than needed (and there are two of it).
} And everything is behind USVen (one per PSU).
} But before that, there was that "tagged overlapped commands" error.
} Also, with one of two disks of a level 1 RAID failing, I wouldn't expect
} anything serious to happen.
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
Main Index |
Thread Index |