tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: zfs and device name changes



hi,

On Mon, Mar 30, 2026 at 8:13 PM Stephen Borrill <netbsd%precedence.co.uk@localhost> wrote:
>
> On Fri, 27 Mar 2026, Takashi YAMAMOTO wrote:
> > On Fri, Mar 27, 2026 at 9:17 PM Stephen Borrill <netbsd%precedence.co.uk@localhost> wrote:
> >>
> >> On Fri, 27 Mar 2026, Takashi YAMAMOTO wrote:
> >>> hi,
> >>>
> >>> On Thu, Mar 26, 2026 at 12:41 PM Taylor R Campbell <riastradh%netbsd.org@localhost> wrote:
> >>>>
> >>>>> Date: Tue, 24 Mar 2026 09:14:37 +0900
> >>>>> From: Takashi YAMAMOTO <yamt9999%gmail.com@localhost>
> >>>>>
> >>>>> the attached patch is my attempt to make zfs a bit more robust against
> >>>>> device name changes.
> >>>>> the identical patch is available at github too:
> >>>>> https://github.com/yamt/netbsd-src/commit/32283c2e362034301c3da218a05849c04ee20c2a
> >>>>>
> >>>>> while it seems working as far as i tested, i'd be happy if someone can review it
> >>>>> as my knowledge of zfs (well, and recent netbsd in general) is weak.
> >>>>
> >>>> I don't understand why all this new code is needed.  Doesn't zfs
> >>>> already have logic to scan all disks/partitions/wedges and find the
> >>>> vdevs by guid?
> >>>
> >>> which code are you talking about?
> >>> it's entirely possible i'm missing something as i'm new to the code base.
> >>>
> >>>>
> >>>> I am under the impression that /etc/zfs/zpool.cache may bypass the
> >>>> scan so this doesn't work in some circumstances, but in my years of
> >>>> using zfs on various machines with frequent device renumbering of cgd
> >>>> volumes and dkN wedges, I have never encountered this type of trouble
> >>>> myself, and I'm not sure what I'm doing differently.
> >>>
> >>> do you mean zfs finds vdevs after renumbering without zpool import?
> >>> it doesn't match my experience.
> >>> without this patch, i had to use zpool export/import after:
> >>> - modify gpt in a way affecting dk numbering
> >>> - swapping qemu disk images
> >>
> >> Naively:
> >>
> >> # zpool create tank mirror xbd2 xbd3 mirror xbd4 xbd5
> >> # zpool status
> >>    pool: tank
> >>   state: ONLINE
> >>    scan: none requested
> >> config:
> >>
> >>          NAME        STATE     READ WRITE CKSUM
> >>          tank        ONLINE       0     0     0
> >>            mirror-0  ONLINE       0     0     0
> >>              xbd2    ONLINE       0     0     0
> >>              xbd3    ONLINE       0     0     0
> >>            mirror-1  ONLINE       0     0     0
> >>              xbd4    ONLINE       0     0     0
> >>              xbd5    ONLINE       0     0     0
> >>
> >> errors: No known data errors
> >> # halt -p
> >>
> >> ** Remove xbd1 to simulate failed/disconnected disk
> >> ** means xbd2 -> xbd1, xbd3 -> xbd2, etc.
> >>
> >> After boot:
> >>
> >> # zpool status
> >>    pool: tank
> >>   state: UNAVAIL
> >> status: One or more devices could not be opened.  There are insufficient
> >>          replicas for the pool to continue functioning.
> >> action: Attach the missing device and online it using 'zpool online'.
> >>     see: http://illumos.org/msg/ZFS-8000-3C
> >>    scan: none requested
> >> config:
> >>
> >>          NAME                     STATE     READ WRITE CKSUM
> >>          tank                     UNAVAIL      0     0     0
> >>            mirror-0               UNAVAIL      0     0     0
> >>              6289893268167966748  FAULTED      0     0     0  was /dev/xbd2
> >>              4017376292647041077  FAULTED      0     0     0  was /dev/xbd3
> >>            mirror-1               UNAVAIL      0     0     0
> >>              4378765686596708079  FAULTED      0     0     0  was /dev/xbd4
> >>              6863498524284650610  UNAVAIL      0     0     0  was /dev/xbd5
> >>
> >> After yamt's patch:
> >>
> >> dmesg shows:
> >> ZFS WARNING: vdev guid mismatch for /dev/xbd2, actual 37c09674044d7835
> >> expected 574a309a2445c01c
> >> ZFS: trying to find a vdev (/dev/xbd2) by guid 574a309a2445c01c
> >> ZFS WARNING: vdev guid mismatch for /dev/xbd3, actual 3cc48031384426ef
> >> expected 37c09674044d7835
> >> ZFS: trying to find a vdev (/dev/xbd3) by guid 37c09674044d7835
> >> ZFS WARNING: vdev guid mismatch for /dev/xbd4, actual 5f400b8b20678472
> >> expected 3cc48031384426ef
> >> ZFS: trying to find a vdev (/dev/xbd4) by guid 3cc48031384426ef
> >> ZFS: trying to find a vdev (/dev/xbd5) by guid 5f400b8b20678472
> >
> > my patch doesn't scan xbd disks as it wasn't included in the list.
> > (see device_is_eligible_for_vdev)
> > if you add it, it should be found.
>
> Indeed. I should have look at the patch more closely!
>
> After adding xbd to the list then, if I remove xbd1, dmesg shows:
>
> ZFS WARNING: vdev guid mismatch for /dev/xbd2, actual 37c09674044d7835
> expected 574a309a2445c01c
> ZFS: trying to find a vdev (/dev/xbd2) by guid 574a309a2445c01c
> ZFS: vdev 574a309a2445c01c found: xbd1 (bdev 142:1)
> ZFS WARNING: vdev guid mismatch for /dev/xbd3, actual 3cc48031384426ef
> expected 37c09674044d7835
> ZFS: trying to find a vdev (/dev/xbd3) by guid 37c09674044d7835
> ZFS: vdev 37c09674044d7835 found: xbd2 (bdev 142:2)
> ZFS WARNING: vdev guid mismatch for /dev/xbd4, actual 5f400b8b20678472
> expected 3cc48031384426ef
> ZFS: trying to find a vdev (/dev/xbd4) by guid 3cc48031384426ef
> ZFS: vdev 3cc48031384426ef found: xbd3 (bdev 142:3)
> ZFS: trying to find a vdev (/dev/xbd5) by guid 5f400b8b20678472
> ZFS: vdev 5f400b8b20678472 found: xbd4 (bdev 142:4)
> ZFS WARNING: vdev guid mismatch for /dev/xbd2, actual 37c09674044d7835
> expected 574a309a2445c01c
> ZFS: trying to find a vdev (/dev/xbd2) by guid 574a309a2445c01c
> ZFS: vdev 574a309a2445c01c found: xbd1 (bdev 142:1)
> ZFS WARNING: vdev guid mismatch for /dev/xbd3, actual 3cc48031384426ef
> expected 37c09674044d7835
> ZFS: trying to find a vdev (/dev/xbd3) by guid 37c09674044d7835
> ZFS: vdev 37c09674044d7835 found: xbd2 (bdev 142:2)
> ZFS WARNING: vdev guid mismatch for /dev/xbd4, actual 5f400b8b20678472
> expected 3cc48031384426ef
> ZFS: trying to find a vdev (/dev/xbd4) by guid 3cc48031384426ef
> ZFS: vdev 3cc48031384426ef found: xbd3 (bdev 142:3)
> ZFS: trying to find a vdev (/dev/xbd5) by guid 5f400b8b20678472
> ZFS: vdev 5f400b8b20678472 found: xbd4 (bdev 142:4)
>
> And the pool configures. It is slightly odd from a UI point of view that
> the old device name are still used (they are actually xbd1-xbd4 now).
>
> # zpool status
>    pool: tank
>   state: ONLINE
>    scan: none requested
> config:
>
>          NAME        STATE     READ WRITE CKSUM
>          tank        ONLINE       0     0     0
>            mirror-0  ONLINE       0     0     0
>              xbd2    ONLINE       0     0     0
>              xbd3    ONLINE       0     0     0
>            mirror-1  ONLINE       0     0     0
>              xbd4    ONLINE       0     0     0
>              xbd5    ONLINE       0     0     0
>
> errors: No known data errors
>
> Thanks!

thank you for testing.

i agree on the ui oddness.
i guess it's same on other OSes with non-path vdev lookup methods. but
i'm not sure as i haven't used zfs on other OSes.

anyway, i'm not going to commit this patch anytime soon as simon might
come up with something better.

>
> --
> Stephen


Home | Main Index | Thread Index | Old Index