Source-Changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

CVS commit: [netbsd-11] src



Module Name:    src
Committed By:   martin
Date:           Fri Apr  3 12:53:54 UTC 2026

Modified Files:
        src/external/cddl/osnet/dist/lib/libzfs/common [netbsd-11]:
            libzfs_import.c
        src/external/cddl/osnet/dist/uts/common/fs/zfs [netbsd-11]: zfs_ioctl.c
            zfs_log.c zfs_rlock.c zfs_vfsops.c zfs_vnops.c zfs_znode.c zvol.c
        src/external/cddl/osnet/dist/uts/common/fs/zfs/sys [netbsd-11]:
            zfs_rlock.h zfs_vfsops.h zfs_znode.h
        src/external/cddl/osnet/sys/kern [netbsd-11]: vfs.c
        src/external/cddl/osnet/sys/sys [netbsd-11]: vnode.h
        src/sys/kern [netbsd-11]: vfs_mount.c

Log Message:
Pull up following revision(s) (requested by yamt in ticket #244):

        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.31
        external/cddl/osnet/dist/lib/libzfs/common/libzfs_import.c: revision 1.9
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_znode.c: revision 1.35
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.32
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_znode.c: revision 1.36
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.33
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.82
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.34
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.83
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.35
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.84
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.85
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.86
        external/cddl/osnet/sys/sys/vnode.h: revision 1.22
        external/cddl/osnet/sys/kern/vfs.c: revision 1.10
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.87
        external/cddl/osnet/sys/sys/vnode.h: revision 1.23
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.88
        external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_znode.h: revision 1.10
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.89
        external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_znode.h: revision 1.11
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_ioctl.c: revision 1.27
        external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_rlock.h: revision 1.4
        external/cddl/osnet/dist/uts/common/fs/zfs/zvol.c: revision 1.15
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.90
        external/cddl/osnet/dist/uts/common/fs/zfs/zvol.c: revision 1.16
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.91
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.92
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.93
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.94
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.95
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.96
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.97
        external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_vfsops.h: revision 1.2
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.98
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.99
        sys/kern/vfs_mount.c: revision 1.111
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_rlock.c: revision 1.7
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_rlock.c: revision 1.8
        external/cddl/osnet/dist/uts/common/fs/zfs/zfs_log.c: revision 1.2

zfs: add zfs_range_lock_try

i plan to use this to fix pgdaemon deadlock issue. (PR/60004)
(thus i didn't bother to implement RL_READER.)
note: recent openzfs has a similar function. (zfs_rangelock_tryenter)
this commit ought to be reverted when/if we switch to it.
PR 60004

zfs_netbsd_putpages: do not make the pagedaemon block on the range lock
blocking here can end up with a deadlock because ordinary
vnops can wait for memory holding the range lock.
fixes PR/60004

solaris vfs_optionisset: treat 0 as unspecified
this allows users to leave it default.
before this change, when a user runs "zfs mount -a",
it was processed as "mount them read-write, overriding readonly property".
i don't think it's what the user usually intends.
looking at the illumos code, it seems that mount options there are
basically tri-state. that is, "ro", "rw", and unspecified.
as NetBSD only has a single bit, MNT_RDONLY or !MNT_RDONLY, this commit
maps !MNT_RDONLY to unspecified, which i believe more often matches
the user's intention. it also seems like what illumos does for the legacy
MS_RDONLY bit if i read their code correctly. that is, if MS_RDONLY is set,
it sets MNTOPT_RO. on the other hand, a lack of MS_RDONLY doesn't imply
MNTOPT_RW.
references:
"Temporary Mount Point Properties" section of zfs(8)
PR/60024

zfs: fix a deadlock in read()
while zfs on netbsd is a non-UBC filesystem, we have a logic to try
mimicking UBC-like consistency between mmap and read/write, which
some "broken" applications might rely on. however, the logic is not
safe as indicated by the XXXNETBSD comment. it isn't safe because
touching user pages can involve page faults, which may need to block
on other (or even same) pages with an undefined locking order.
this commit fixes it by using an intermediate buffer to avoid touching
user pages while keeping a file page busy.
although this probably can be optimized by checking VV_MAPPED, i'm not
in a mood to complicate this already-complicated code further. because
zfs doesn't use UBC, if a file has uvm pages, it almost certainly has
VV_MAPPED anyway.

tested with https://github.com/yamt/garbage/blob/master/c/ubc/ubctest.c
an alternative fix would be to drop these UBC-compat logic altogether.
while it surely simplifies the code, it might break some applications
which don't use msync properly. i suspect such applications are not
so rare, because UBC is ubiquitous among modern operating systems
these days.

zfs: fix data loss with some combinations of mmap and write
in write(), make a mmap page clean only when we are overwriting the
whole page. otherwise, modifications made via mmap which are outside
the overwritten region will be lost.
tested with https://github.com/yamt/garbage/blob/master/c/ubc/ubctest.c

zfs: reject read() on directory
right now, netbsd in general allows read() on directory for
the compatibility with historical applications. (i have not
seen such an application by myself though. is anyone around
here still keeping such ancient binaries? i'm curious if such
a binary still works on today's UFS.)

this commit makes zfs reject such an attempt because zfs is
not prepared to produce the historical UFS dirent structure.

zfs: fix case insensitive / utf-8 normalized file names
zfs has a few options for file name comparison.
when they are enabled, disable netbsd's name cache, which only
supports exact-byte-matching, to avoid inconsistent behaviors.
cf. "casesensitivity" and "normalization" in zfs(8)

zfs: purge name cache on teardown
this fixes name cache inconsistencies on
certain events. eg. rollback
```
zfs create $FS
echo a > /$FS/a.txt
echo b > /$FS/b.txt
echo c > /$FS/c.txt
zfs snap $FS@2
rm /$FS/b.txt
cat /$FS/a.txt
cat /$FS/b.txt || :  # create negative cache entry
cat /$FS/c.txt
zfs rollback $FS@2
cat /$FS/a.txt
cat /$FS/b.txt  # hit the negative cache entry because of the bug
cat /$FS/c.txt
```
zfs zvol.c: #ifdef out zvol_log_truncate
the functions is currently not used by netbsd.
disable compilation of it to make it easier to port patches.
zfs: remove unused whiteout logic
zfs: fix zfs_range_lock_try
the change "zfs: add zfs_range_lock_try" was incomplete.
i've observed the following deadlock:
```
db{0}> tr /a ffff96777f74f400
trace: pid 0 lid 125 at 0xffffce80c3203b50
sleepq_block() at netbsd:sleepq_block+0xf4
cv_wait() at netbsd:cv_wait+0xca
pool_grow() at netbsd:pool_grow+0x47b
pool_get() at netbsd:pool_get+0xae
pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x136
pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x27d
kmem_intr_alloc() at netbsd:kmem_intr_alloc+0x13c
kmem_alloc() at netbsd:kmem_alloc+0x2a
zfs_range_lock_impl() at zfs:zfs_range_lock_impl+0x30
zfs_netbsd_putpages() at zfs:zfs_netbsd_putpages+0x1c0
VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x43
uvm_pageout() at netbsd:uvm_pageout+0x257
db{0}>
```
this commit fixes it by using KM_NOSLEEP when non-blocking
operation is requested.

zfs: reject all mount op flags for now
- our logic in zfs_vfsops.c is inconsistent. sometimes it checks
  uap->flags, sometimes vfsp->vfs_flag. (aka mnt_flag)
- our userland tools (zfs, mount_zfs) currently don't seem to have
  a way to pass these flags anyway. (zmount in libzfs always passes
  0 to both of mount(2) 'flags' argument and 'uap->flags'. although
  it stores something in uap->mflag and uap->optptr, nothing uses
  them. it doesn't even set MS_OPTIONSTR. we don't implement
  MS_OPTIONSTR anyway.)
this commit simply rejects them for safety. as these operations have
never been implemented in NetBSD, it shouldn't have any impact to users.
maybe someday we should fix these, but i guess it involves some
ABI changes, which i'm not in a mood to do right now.
related to PR/60026

zfs_vnops.c: fix whitespace
no functional changes are intended.

zfs_netbsd_gop_markupdate: actually update file timestamp
the implementation before this commit was basically no-op.
some notes:
* this is (ab)used in zfs_netbsd_write for fifo/spec vnodes.
  i feel it's a bit excessive to update the timestamp on every
  writes to /dev/null. unfortunately, zfs doesn't have nodevmtime
  option. well, i suspect netbsd is the only os with the traditional
  devmtime behavior these days. we may want to implement delayed
  mtime update as ffs does.
* this is used by zfs_netbsd_putpages via genfs_putpages. but it's
  redundant because zfs_putapage updates the timestamp as well.
* this is not used by zfs_netbsd_getpages. zfs doesn't use
  genfs_getpages. zfs_netbsd_getpages doesn't have the
  corresponding logic either. maybe it's ok for most of applications
  as far as mtime will be updated sooner or later.

zfs: remove mysterious comments on read/write ops for spec/fifo

zfs: fix file vdev
make solaris compat vn_openat to honor the root vnode specified
by the caller. it's currently only used for vdev_file.c.
this commit fixes "no such pool or dataset" error on zpool create
with files:
```
uma% dd if=/dev/zero of=/tmp/hoge count=100
100+0 records in
100+0 records out
51200 bytes transferred in 0.001 secs (51200000 bytes/sec)
uma% sudo zpool create f /tmp/hoge
cannot create 'f': no such pool or dataset
uma%
```
cf. "file" in "Virtual Devices (vdevs)" in zpool(8).
zpool_find_import_impl: fix block/character device confusion
this commit fixes zpool import failure in some cases.
the current logic sometimes (eg. "zpool import -d" with a directory
which is not "/dev") ends up with picking character devices
and pass them to ZFS_IOC_POOL_TRYIMPORT/ZFS_IOC_POOL_IMPORT ioctl.
such attempts would fail, marking the corrosponding vdevs UNAVAIL.
this commit fixes it by skipping character devices.
also, this commit makes the label checking logic prefer to use
character devices when available because it seems like the intention
of the upstream logic.
also, this commit fixes import of file-backed vdevs.

zfs: fix case insensitive / utf-8 normalized file names (cont.)
this was intented to be a part of an earlier commit.
("zfs: fix case insensitive / utf-8 normalized file names")
for some reasons, it seems i unintentionally dropped this hunk
when porting the commit from git to cvs.
dounmount/vfs_insmntque: allow vcache_get during VFS_UNMOUNT
we currently have assertions to prevent file systems from
populating its vnode cache during VFS_UNMOUNT. this commit
relaxes the assertions a bit to allow vcache_get during
VFS_UNMOUNT. although VFS_UNMOUNT should still eventually
drain the vnode cache for the mount, this commit allows it
to populate its vnode cache temporarily.
this is for zfs, which sometimes need to access znode when
committing the log. (zfs_get_data)  a forthcoming zfs change
will depend on this change.
PR/59885
discussed on tech-kern.

zfs: fix "slow rm" issue
* stop commiting zil in zfs_netbsd_reclaim and other operations
  in vnode reclaim path.
* retire zfs_zget_cleaner/VN_RELE_CLEANER.
  instead, just use normal zfs_zget and vrele_async.
note that these two changes depend on each other:
* zfs_zget_cleaner relies on zil_commit in zfs_netbsd_reclaim to
  ensure that the znode referenced by TX_WRITE itx is always in-core.
* otoh, zfs_zget_clear makes zil_commit in the vnode reclaim path
  possible. that is, zfs_netbsd_reclaim (VOP_RECLAIM) is called with
  the vnode in VS_RECLAIMING state, which would make vcache_vget
  block.
  if the vnode being reclaimed happened to have TX_WRITE itx on the
  zil, it deadlocks.
an alternative would be to make the upper layer (vfs_vnode.c) retain
unlinked vnodes for a while. (a bit longer than the 5 sec txg commit
interval should be enough.) eg. by making zfs_netbsd_inactive report
a_recycle = 0. but i guess it's better to remove
zfs_zget_cleaner/VN_RELE_CLEANER to to keep the code less diverged
from the upstream zfs.

also, this commit makes zfs_umount retry vflush a bit.
it's necessary because, for some reasons, during unmount, zil_close
commits the log, which can load some referenced vnodes back to the
cache. i don't understand why zil_close needs to commit the log
when we are syncing txg for unmount anyway. although it might be
possible to avoid the zil commit at all, probably this change
is less invasive than that. this logic is partly from J. Hannken-Illjes.
PR/59885
discussed on tech-kern.
https://mail-index.netbsd.org/tech-kern/2026/02/20/msg030817.html

zfs: flush mmap pages on fsync
it seems the logic to flush page cache in fsync has been removed
during the initial port to netbsd. at that point it was probably ok
because we simply didn't support mmap. since then, mmap support has
been added. but the fsync logic has not been restored. it means that
mmap-modified pages are left dirty basically forever, unless the
application explicitly performs msync on them or page daemon tries
to reclaim them on system memory shortage. it's bad especially for
a file system like zfs because writing data to zfs involves complex
locking and memory allocations, and thus not safe in the context of
the page daemon.

this commit fixes (well, at least improves the situation a bit) by
putting back the page flushing logic.
ideally netbsd needs to have some throttling mechanism on
page-dirtying activities. i suppose such a mechanism can be
implemented in a mostly filesystem-independent manner.
(it was one of my motivations of yamt-pagecache branch.)
zfs: don't commit the zil for FSYNC_LAZY
FSYNC_LAZY is meant for periodic syncer activity.
unlike fsync() system call, it doesn't give any promises
about data integrity to users.

zfs_putapage: don't try to write to zfs in the page daemon context
basically zfs is not prepared to be called safely for page daemon.
for now, if we found the page dirty, (thus we need to push it into zfs)
just punt with ENOMEM. hopefully the page daemon will find some other
pages to reclaim.
if the system is already full of dirty pages backed by zfs, i suppose
there is no good way to recover. for a longer term, we probably need
some dirty-page throttling mechanism to avoid the situation in the
first place.

zfs: fix "slow rm" issue (cont.)
commit a change which was lost during a porting from
my local git repo to cvs.
fortunately, it was harmless to miss this change though.
zfs: fix deadlock with GOP_MARKUPDATE
because genfs_putpages calls GOP_MARKUPDATE with v_interlock held,
it isn't safe to wait for txg or other i/o. this is a regression
caused by a recent change.
("zfs_netbsd_gop_markupdate: actually update file timestamp")
this commit fixes it by simply dropping GOP_MARKUPDATE for zfs.
as mentioned in the commit message of the change in question,
it's redundant for putpages as we update the timestamps in
GOP_WRITE as well.
for spec/fifo, call the timestamp update logic directly,
not via GOP_MARKUPDATE.
the problem was pointed out by J. Hannken-Illjes.
he also tested this patch.

zfs: put back deferred atime update to VOP_INACTIVE
we currently push atime updates in VOP_RECLAIM and VFS_SYNC.
VFS_SYNC iterates all cached vnodes for that:
/*
* On NetBSD, we need to push out atime updates.  Solaris does
* this during VOP_INACTIVE, but that does not work well with the
* BSD VFS, so we do it in batch here.
*/
it isn't ideal for systems with large vnode cache.
i'm not sure why it "does not work well with the BSD VFS" either.
maybe historical reasons which don't hold anymore?
this commit put the atime pushing logic to VOP_INACTIVE, where
it's done in solaris and freebsd. it seems working well as far as
i tested.
note: deferring it further to VOP_RECLAIM as we do for ffs has
its advantages. however, i prefer to keep the divergance from the
upstream smaller for now. i also have vague concerns on the
interactions with zfs features like snapshots. may revisit later.
discussed on tech-kern.
https://mail-index.netbsd.org/tech-kern/2026/03/17/msg030895.html

zfs: use 32-bit st_dev for stat(2)
while dev_t is 64-bit on NetBSD since the merge of
christos-time_t branch in 2009, we only use the lower
32-bit of it, at least for the purpose of specifying
a device in the kernel.
however, dev_t is also used as a file system id. eg. st_dev
reported by stat(2). as zfs has no device to naturally represent
its file system, currently it reports 56-bit guid of the file
system for the purpose.

unfortunately, some user applications still consider it as
a good old device id and assumes operations like
makedev(major(dev),minor(dev)) preserves the value.
it doesn't hold for NetBSD's implementation of makedev and
friends, which only honors the lower 32-bit of the dev_t.
this commit makes zfs report fsid with the high 32-bit zeros
to avoid the issues in such applications. namely, this fixes an
issue with rsync, reported by HIROSE yuuji on a japanese ML [1]
in 2024. you can find his reproduce recipe below. i was able to
reproduce the issue with rsync-3.4.1 from pkgsrc.
maybe we can "fix" our, at least userland-visible version of,
makedev and friends to provide full 64-bit round-trip as some
of other platforms do. (eg. glibc, freebsd)

although it might be an improvement and can benefit other things
like nfs v3, it isn't an alternative to this fix because it
doesn't fix existing application binaries built with the current
version of the macros.

note: this commit also changes statvfs f_fsid. as f_fsid is a long,
before this change, we were truncating the value only on 32-bit ports.
note: this commit doesn't change the "netbsd-extended" fsid
(f_fsidx), which is used for nfs file handles.

note: this commit would cause a flag day for applications which
somehow save st_dev of files. are there such applications?

note: this commit would increase the chance of fsid conflicts.
currently zfs ensures its fsids unique within zfs, but not with
other netbsd file systems. with this commit, there can be conflict
even within zfs. (mentioned in PR/60135)
```
rm -rf src
rm -rf dest
mkdir -p src/a/b/c
mkdir -p src/1/2/3
mkdir dest
rsync -avx --delete src dest
rm -r src/1
rsync -avx --delete src dest
test -d dest/src/1 && echo "this directory should have been removed"
```
[1] http://www.re.soum.co.jp/~jun/welcome.html#netbsd
related to PR/60135

zfs: port a fix for data corruption issue from illumos
see https://www.illumos.org/issues/17734

note: i didn't bother to patch illumos/freebsd code in our tree.
the original commit message:
commit f6559a18843abdfa5849b9e74f239f9bd15796d3
Author: Andy Fiddaman <illumos%fiddaman.net@localhost>
Date:   Mon Nov 10 22:52:05 2025 +0000
17734 ZFS fsync can trigger ZIL transaction reordering and data corruption
Portions contributed by: Alexander Motin <mav%FreeBSD.org@localhost>
Reviewed by: Ryan Zezeski <ryan%zinascii.com@localhost>
Reviewed by: Toomas Soome <tsoome%me.com@localhost>
Approved by: Dan McDonald <danmcd%edgecast.io@localhost>

a review request on tech-kern:
https://mail-index.netbsd.org/tech-kern/2026/03/04/msg030862.html


To generate a diff of this commit:
cvs rdiff -u -r1.7 -r1.7.6.1 \
    src/external/cddl/osnet/dist/lib/libzfs/common/libzfs_import.c
cvs rdiff -u -r1.26 -r1.26.4.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_ioctl.c
cvs rdiff -u -r1.1.1.3 -r1.1.1.3.16.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_log.c
cvs rdiff -u -r1.6 -r1.6.16.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_rlock.c
cvs rdiff -u -r1.30 -r1.30.6.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c
cvs rdiff -u -r1.81 -r1.81.4.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c
cvs rdiff -u -r1.34 -r1.34.10.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/zfs_znode.c
cvs rdiff -u -r1.14 -r1.14.2.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/zvol.c
cvs rdiff -u -r1.3 -r1.3.16.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_rlock.h
cvs rdiff -u -r1.1.1.3 -r1.1.1.3.16.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_vfsops.h
cvs rdiff -u -r1.9 -r1.9.14.1 \
    src/external/cddl/osnet/dist/uts/common/fs/zfs/sys/zfs_znode.h
cvs rdiff -u -r1.9 -r1.9.6.1 src/external/cddl/osnet/sys/kern/vfs.c
cvs rdiff -u -r1.21 -r1.21.6.1 src/external/cddl/osnet/sys/sys/vnode.h
cvs rdiff -u -r1.110 -r1.110.2.1 src/sys/kern/vfs_mount.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.




Home | Main Index | Thread Index | Old Index