NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/59885: zfs: unlink/rm is slow to delete last link because it always zil_commits
>Number: 59885
>Category: kern
>Synopsis: zfs: unlink/rm is slow to delete last link because it always zil_commits
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 03 21:50:00 +0000 2026
>Originator: Taylor R Campbell
>Release: current, 11, 10, 9, ...
>Organization:
The NetZFS Slowdowndation, Inc.
>Environment:
>Description:
We have a local change to zfs to commit the zil whenever a zfs
vnode is reclaimed, which usually happens when deleting the
last link to a file:
/*
* Operation zfs_znode.c::zfs_zget_cleaner() depends on this
* zil_commit() as a barrier to guarantee the znode cannot
* get freed before its log entries are resolved.
*/
if (zfsvfs->z_log)
zil_commit(zfsvfs->z_log, zp->z_id);
This is required because the logic to write data for file the
to the log (zfs_get_data), queued up by past writes to the
file, relies on acquiring a reference to the vnode as indexed
by the zfs object id (equivalent of inode number) and using the
struct znode before it is freed.
That logic might run after the vnode has gone through
VOP_RECLAIM, but NetBSD's vnode life cycle treats reclamation
as final and forbids acquiring new references or even looking
up the vnode by its object id via vcache(9), and
zfs_netbsd_reclaim unconditionally frees the struct znode with
zfs_znode_free immediately afterward (all paths out of
zfs_zinactive go, either directly or via zfs_rmnode and
sometimes then via zfs_znode_delete, through zfs_znode_free):
if (zp->z_sa_hdl == NULL)
zfs_znode_free(zp);
else
zfs_zinactive(zp);
Committing the zil first avoids this trouble. But committing
the zil is costly (requires writing all pending transactions to
disk and flushing the disk cache and updating root pointers and
so on), much costlier than just logging a file operation like
unlink.
And removing the last link to a file causes its vnode to be
reclaimed synchronously, essentially every rm(1) or equivalent
(in the absence of multiple hard links to a file) triggers this
logic, making it very slow.
>How-To-Repeat:
rm -rf /large/directory/tree
>Fix:
It was a huge improvement to the reliability and
maintainability of NetBSD's vnode life cycle that we began to
forbid reviving vnodes from the dead about a decade ago, so
reversing that decision is a non-starter.
But perhaps we can add a reference count to the znode itself,
when it is pending in a log transaction for zfs_get_data later,
so that it is only freed after both zfs_netbsd_reclaim _and_
zfs_get_data are done with it. Note that the only use that
zfs_get_data makes of the _vnode_ is to release a reference
(which we have made into a no-op because it is taking that
`reference' only during reclamation when acquiring new vnode
references is forbidden).
This will also require making sure the object id is not
recycled too early -- possibly by some combination of
ZFS_OBJ_HOLD_ENTER and ZFS_TEARDOWN_INACTIVE_ENTER/EXIT_READ,
not sure.
Home |
Main Index |
Thread Index |
Old Index