Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: zpool import skips wedges due to a race condition



On Wed, Sep 08, 2021 at 06:38:02AM -0000, Michael van Elst wrote:
> alnsn%yandex.ru@localhost (Alexander Nasonov) writes:
> 
> >When I run zfs import, it launches 32 threads and opens 32 disks in
> >parallel, including cgd1 and dk24. But it can't open dk24 while
> >cgd1 is still open (it fails with EBUSY).
> 
> >I fixed it in the attatched patch by running only one thread. It's
> >not the best approach but I'm not sure how to fix it properly.
> 
> 
> There are other issues with scanning devices in an arbitrary order,
> a parallel scan just makes it worse by adding randomness.
> 
> LVM tries to solve this with an optional filter for device names
> when scanning for physical volumes.
> 
> The root detection code tries to solve this by scanning twice, once
> for wedges, once for everything else, and by identifying wedges
> that alias a partition.
> 
> For a complete solution you would need to know all the device relationships
> (dkX on wdY, cgdN on dmN, etc, but also e.g. dkX on cgdN). That still leaves
> out hot-plug devices where "upper" devices appear late.

I see something similar in another context: when I shutdown, shutdown
can stall on a system with dkX on cgdN, with

detaching dkX
detaching cgdN
detaching dkX    (same X)
(hang)

If the dice are rolled correctly, and cgdN gets the detach before dkX,
it shuts down properly...


Cheers,

Patrick


Home | Main Index | Thread Index | Old Index