Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Unreliable GPT detection
After an upgrade of the Rock64-based satellite I have at our cabin, it
could no longer mount its "dk0" wedge from the USB connected external
disk. (It runs off internal MMC, while the external disk supports its
function as a MooseFS storage node.) This used to look like this
(sample from a different system, still running the older -current):
[Sat Feb 8 13:26:34 CET 2025] umass0 at uhub4 port 2 configuration 1 interface 0
[Sat Feb 8 13:26:34 CET 2025] umass0: Western Digital (0x1058) Elements SE 25FE (0x25fe), rev 2.10/10.19, addr 4
[Sat Feb 8 13:26:34 CET 2025] umass0: using SCSI over Bulk-Only
[Sat Feb 8 13:26:34 CET 2025] scsibus0 at umass0: 2 targets, 2 luns per target
[Sat Feb 8 13:26:34 CET 2025] sd0 at scsibus0 target 0 lun 0: <WD, Elements SE 25FE, 1019> disk fixed
[Sat Feb 8 13:26:34 CET 2025] sd0(umass0:0:0:0): not ready, data = 00 00 00 00 04 01 00 00 00 00
[Sat Feb 8 13:26:34 CET 2025] sd0: drive offline
[Sat Feb 8 13:26:37 CET 2025] sd0: fabricating a geometry
[Sat Feb 8 13:26:37 CET 2025] sd0: GPT GUID: 33c631e7-ad51-4fe3-9071-80c9ca4e7770
[Sat Feb 8 13:26:37 CET 2025] dk0 at sd0: "Elements SE", 1953454080 blocks at 2048, type: ffs
[Sat Feb 8 13:26:37 CET 2025] uk0 at scsibus0 target 0 lun 1: <WD, SES Device, 1019> enclosure services fixed
Note how it takes a little while for the disk to become ready, as it is
set up to only spin while it is being actively used. The GPT is
discovered approximately three seconds after sd0 is initially probed.
With a fresh -current, the behaviour is different:
[Sat Feb 8 10:41:55 CET 2025] umass0 at uhub2 port 1 configuration 1 interface 0
[Sat Feb 8 10:41:55 CET 2025] umass0: Western Digital (0x1058) Elements SE 25FE (0x25fe), rev 2.10/10.19, addr 2
[Sat Feb 8 10:41:55 CET 2025] umass0: using SCSI over Bulk-Only
[Sat Feb 8 10:41:55 CET 2025] scsibus0 at umass0: 2 targets, 2 luns per target
[Sat Feb 8 10:41:55 CET 2025] sd0 at scsibus0 target 0 lun 0: <WD, Elements SE 25FE, 1019> disk fixed
[Sat Feb 8 10:41:55 CET 2025] sd0(umass0:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
[Sat Feb 8 10:41:55 CET 2025] SENSE KEY: Not Ready
[Sat Feb 8 10:41:55 CET 2025] ASC/ASCQ: Logical Unit Is In Process Of Becoming Ready
[Sat Feb 8 10:41:55 CET 2025] sd0: drive offline
[Sat Feb 8 10:41:55 CET 2025] uk0 at scsibus0 target 0 lun 1: <WD, SES Device, 1019> enclosure services fixed
We get more information - the disk is "in process of becoming ready",
or, in fact, spinning up. However, the GPT is never discovered. This
is because the failure of the call to scsipi_test_unit_ready(), at
sys/dev/scsipi/sd.c:319, caused by the disk not being completely spun up
yet, blocks the call to dkwedge_discover() at sys/dev/scsipi/sd.c:355.
In this situation, I can't get the GPT recognized. In fact, using
gpt(8) to look at sd0, it says there is nothing there, even though sd0
is, at that time, fully online, and working fine in its own right.
As a quick workaround, to get my particular installation working again,
I modified scsipi_test_unit_ready() to be willing to retry a few times,
if called with XS_CTL_DISCOVERY, using kpause() to wait a second between
retries, and I get this:
[Sat Feb 8 20:01:30 CET 2025] umass0 at uhub5 port 4 configuration 1 interface 0
[Sat Feb 8 20:01:30 CET 2025] umass0: Western Digital (0x1058) Elements SE 25FE (0x25fe), rev 2.10/10.19, addr 4
[Sat Feb 8 20:01:30 CET 2025] umass0: using SCSI over Bulk-Only
[Sat Feb 8 20:01:30 CET 2025] scsibus0 at umass0: 2 targets, 2 luns per target
[Sat Feb 8 20:01:30 CET 2025] sd0 at scsibus0 target 0 lun 0: <WD, Elements SE 25FE, 1019> disk fixed
[Sat Feb 8 20:01:30 CET 2025] sd0(umass0:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
[Sat Feb 8 20:01:30 CET 2025] SENSE KEY: Not Ready
[Sat Feb 8 20:01:30 CET 2025] ASC/ASCQ: Logical Unit Is In Process Of Becoming Ready
[Sat Feb 8 20:01:31 CET 2025] sd0(umass0:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
[Sat Feb 8 20:01:31 CET 2025] SENSE KEY: Not Ready
[Sat Feb 8 20:01:31 CET 2025] ASC/ASCQ: Logical Unit Is In Process Of Becoming Ready
[Sat Feb 8 20:01:32 CET 2025] sd0(umass0:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
[Sat Feb 8 20:01:32 CET 2025] SENSE KEY: Not Ready
[Sat Feb 8 20:01:32 CET 2025] ASC/ASCQ: Logical Unit Is In Process Of Becoming Ready
[Sat Feb 8 20:01:33 CET 2025] sd0: fabricating a geometry
[Sat Feb 8 20:01:33 CET 2025] sd0: 931 GB, 953837 cyl, 64 head, 32 sec, 512 bytes/sect x 1953458176 sectors
[Sat Feb 8 20:01:33 CET 2025] sd0: fabricating a geometry
[Sat Feb 8 20:01:33 CET 2025] sd0: GPT GUID: 33c631e7-ad51-4fe3-9071-80c9ca4e7770
[Sat Feb 8 20:01:33 CET 2025] dk0 at sd0: "Elements SE", 1953454080 blocks at 2048, type: ffs
[Sat Feb 8 20:01:33 CET 2025] uk0 at scsibus0 target 0 lun 1: <WD, SES Device, 1019> enclosure services fixed
I guess the better solution would involve recognizing the "in process of
becoming ready" situation, and being willing to wait for the device
then. With my naïve change, plugging in e.g. an empty SD reader causes
the kernel thread that's detecting and attaching it to spend a few
seconds hoping that the missing SD will magically show up. :)
Thoughts? I'll probably take a stab at the outlined solution if noöne
has a better idea for me, but it may take a while. Round tuits are hard
to come by these days, unfortunately.
I haven't looked for the code change that introduced the non-detection
of GPT. My cabin satellite is updated rather infrequently, and what
was running there until this recent upgrade is 11 months old.
-tih
--
Puppies are cute.
Home |
Main Index |
Thread Index |
Old Index