Subject: kern/24596: genfs_putpages() problems
To: None <>
From: None <>
List: netbsd-bugs
Date: 02/29/2004 02:56:28
>Number:         24596
>Category:       kern
>Synopsis:       genfs_putpages() problems
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 29 02:57:00 UTC 2004
>Originator:     Stephan Uphoff
>Release:        current
1) genfs_putpages assumes that a range is synchronized 
   if it encounters no pages that it must clean and if
   v_numoutput was zero on entry of the function.

   It just skips pages marked PG_RELEASED or PG_PAGEOUT.
   This is wrong.   
   If pages marked with PG_RELEASED or PG_PAGEOUT are encountered
   wasclean must be set to FALSE.
   Reading v_numoutput on startup is not enough as a second concurrent
   call can write the pages.
   ( And the first call can block on a clean page and never encounter  
     dirty pages)

   This can violate fsync(2), NFS and other data stability guarantees.

   Solution: set wasclean to false when encountering pages marked 
             PG_RELEASED or PG_PAGEOUT

2) genfs_putpages takes the vnode off the v_synclist if it thinks
   that the file is clean.
   Since genfs_putpages can even block even when the file is clean
   the file might have accumulated new dirty blocks not accessed
   by the scan.

   These dirty blocks might not be flushed for a long long time.

   Solution: add generation number to genfs_node ?

3) genfs_putpages does not write protect the pages that it encounters.
   This means existing writable mappings can dirty pages after the 
   vnode is removed from the v_synclist.
   ( And no fault will reinsert it in the v_synclist)

   ubc_alloc/ubc_release used my filesystem write operations can
   operate on cached writable mappings to the pages and the dirty 
   blocks might hang around in memory forever without being flushed.

   ( Jason, Chuck - we exchanged emails about this part last summer)

    I believe this is only a problem for write(2) since requiring 
    an explicit msync for writable mmap memory is expected behavior.
    ( Not sure about this - I will look up the standards the next days)
   Solution: unconditionally re-insert the vnode in the v_synclist
             in VOP_WRITE after the last ubc_release call.


see "Full Description"