Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

zpool import skips wedges due to a race condition



(forwarding to current-users because tech-misc appears to be inactive)

----- Forwarded message from Alexander Nasonov <alnsn%yandex.ru@localhost> -----

Date: Sun, 5 Sep 2021 22:16:48 +0100
From: Alexander Nasonov <alnsn%yandex.ru@localhost>
To: tech-misc%netbsd.org@localhost
Subject: zpool import skips wedges due to a race condition

zfs import reliably fails to detect a pool on my server. The pool lives
on cgd1:

# dkctl cgd1 listwedges
/dev/rcgd1: 1 wedge:
dk24: zcgdroot, 6688832954 blocks at 34, type: zfs

When I run zfs import, it launches 32 threads and opens 32 disks in
parallel, including cgd1 and dk24. But it can't open dk24 while
cgd1 is still open (it fails with EBUSY).

I fixed it in the attatched patch by running only one thread. It's
not the best approach but I'm not sure how to fix it properly.

Alex

Index: libzfs_import.c
===================================================================
RCS file: /cvsroot/src/external/cddl/osnet/dist/lib/libzfs/common/libzfs_import.c,v
retrieving revision 1.7
diff -p -u -u -r1.7 libzfs_import.c
--- libzfs_import.c	28 Aug 2021 10:47:45 -0000	1.7
+++ libzfs_import.c	5 Sep 2021 20:50:35 -0000
@@ -1326,9 +1326,11 @@ skipdir:
 		 * double the number of processors; we hold a lot of
 		 * locks in the kernel, so going beyond this doesn't
 		 * buy us much.
+		 * XXX It's not a very smart idea to open all disks in
+		 * parallel because wedges on NetBSD can't be open while
+		 * a parent disk is open. For now, only run one thread.
 		 */
-		t = tpool_create(1, 2 * sysconf(_SC_NPROCESSORS_ONLN),
-		    0, NULL);
+		t = tpool_create(1, 1, 0, NULL);
 		for (slice = avl_first(&slice_cache); slice;
 		    (slice = avl_walk(&slice_cache, slice,
 		    AVL_AFTER)))


----- End forwarded message -----


Home | Main Index | Thread Index | Old Index