Subject: kern/24596: genfs_putpages() problems
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <ups@tree.com>
List: netbsd-bugs
Date: 02/29/2004 02:56:28
>Number: 24596
>Category: kern
>Synopsis: genfs_putpages() problems
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Feb 29 02:57:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Stephan Uphoff
>Release: current
>Organization:
>Environment:
N/A
>Description:
1) genfs_putpages assumes that a range is synchronized
if it encounters no pages that it must clean and if
v_numoutput was zero on entry of the function.
It just skips pages marked PG_RELEASED or PG_PAGEOUT.
This is wrong.
If pages marked with PG_RELEASED or PG_PAGEOUT are encountered
wasclean must be set to FALSE.
Reading v_numoutput on startup is not enough as a second concurrent
call can write the pages.
( And the first call can block on a clean page and never encounter
dirty pages)
This can violate fsync(2), NFS and other data stability guarantees.
Solution: set wasclean to false when encountering pages marked
PG_RELEASED or PG_PAGEOUT
2) genfs_putpages takes the vnode off the v_synclist if it thinks
that the file is clean.
Since genfs_putpages can even block even when the file is clean
the file might have accumulated new dirty blocks not accessed
by the scan.
These dirty blocks might not be flushed for a long long time.
Solution: add generation number to genfs_node ?
3) genfs_putpages does not write protect the pages that it encounters.
This means existing writable mappings can dirty pages after the
vnode is removed from the v_synclist.
( And no fault will reinsert it in the v_synclist)
ubc_alloc/ubc_release used my filesystem write operations can
operate on cached writable mappings to the pages and the dirty
blocks might hang around in memory forever without being flushed.
( Jason, Chuck - we exchanged emails about this part last summer)
I believe this is only a problem for write(2) since requiring
an explicit msync for writable mmap memory is expected behavior.
( Not sure about this - I will look up the standards the next days)
Solution: unconditionally re-insert the vnode in the v_synclist
in VOP_WRITE after the last ubc_release call.
>How-To-Repeat:
>Fix:
see "Full Description"
>Release-Note:
>Audit-Trail:
>Unformatted: