Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: WDCTL_RST failed for drive 0 / wd0: IDENTIFY failed (SATA autodetection issue after installation)



Thank you for the detailed report!

I've added these controllers for the quirk list. With ahcisata_pci.c rev 1.63
and later, AHCISATA_EXTRA_DELAY kernel option is no longer required.

Thanks,
rin

On 2022/05/27 15:02, Matthias Petermann wrote:
Hello Rin,

the option AHCISATA_EXTRA_DELAY seems to fix the problem for both systems below.

As discussed I send here the two dmesg with:

  - dmesg.nuc5.txt: from my NUC5 with AHCI and a Seagate hard disk.

  - dmesg.fujitsu.txt: from my Esprimo, with AHCI and wd2 (Seagate) and wd3 (WD).

A few more notes:

  - On the NUC, I had intermediately and temporarily replaced the hard drive. In the process, the reproducibility of the problem suffered. Before I "moved" the cables, I could see the problem every time I booted. Now it's more of a coincidence that it happens (even with the original hard drive installed).

  - On the Esprimo - when the error occurs at almost every cold boot - according to my observations, both mechanical hard disks are always affected (wd2 and wd3). The SSDs (wd0 and wd1), on the other hand, are always detected correctly.

More generally, the state of the cabling seems to contribute at least somewhat to the problems. With the NUC, unplugging and plugging in changed the probability of occurrence. With the Fujitsu, I noticed the problems more since I installed a 4x SATA dock. That the problem is almost certainly related to the AHCI SATA delay would be judged by the fact that it only occurs with NetBSD 9.99.x and not with 9.2 or FreeBSD/Linux.

Especially with the Fujitsu, however, I had already exchanged cables several times beforehand and tried different things, because I had initially suspected a pure cabling problem. However, it seems to me at the moment that the cabling at most changes the timing and this is set so "on edge" that the problem sometimes occurs and sometimes not.

Kind regards
Matthias


Am 24.05.2022 um 18:23 schrieb Rin Okuyama:
Hi,

The recent change for probe timing should only affect ahcisata(4).
Is your SATA controller ahcisata(4)? If so,

(1) please try kernel built with:

---
options AHCISATA_EXTRA_DELAY
---

If it works around the problem,

(2) please send us full dmesg of your machine.

Then, we can add your controller to the quirk list. At once it is
registered to the list, AHCISATA_EXTRA_DELAY option is no longer
required.

Thanks,
rin

On 2022/05/25 0:49, Matthias Petermann wrote:
A small addendum: disabling the Intel Platform Trust technology in the BIOS did not help me (had read this in another post of the linked thread).

However, by plugging in additional USB devices (a mouse) I apparently caused the necessary delay, which the disk would have needed in the first case to execute the WDCTL_RST without errors. This "workaround" is a shaky one though, an extremely close call. I don't even want to think about what I would do to a production server if this happened to me on a reboot.

Kind regards
Matthias


Am 24.05.2022 um 17:31 schrieb Matthias Petermann:

Hello all,

with one of the newer builds of 9.99 (unfortunately I can't narrow it down more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive (hybrid HDD/SSD).

As long as I boot from the USB stick (for installation, as well as later for booting the kernel with root redirected to the wd0) the hard drive wd0 is recognized correctly and works without problems.

When I boot directly from the wd0 hard drive, I get through the boot loader fine, which also still loads the kernel correctly into memory. However, when running the initialization or hardware detection, there is then a problem with the initialization of wd0:

```
WDCTL_RST failed for drive 0
wd0: IDENTIFY failed
```

The error pattern seems to be not quite rare and probably the closest to it is this post:

http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html

Recent changes to the SATA autodetection timing are mentioned there. This would fit my experience, since I had the problem neither with 9.1 (build from 02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about this timing thing, as well as known workarounds if there are any? I have several NUC5s with exactly this model of hard drive running stably for several years - it would be a shame if I now have to replace them for such a reason.

Many greetings
Matthias



Home | Main Index | Thread Index | Old Index