tech-kern: Re: Observations on our VM system problem

Subject: Re: Observations on our VM system problem
To: None <ghudson@MIT.EDU, tech-kern@NetBSD.ORG>
From: Mike Hibler <mike@fast.cs.utah.edu>
List: tech-kern
Date: 03/01/1996 23:47:22

> From: ghudson@MIT.EDU
> Date: Fri, 1 Mar 1996 06:12:27 -0500
> To: tech-kern@NetBSD.ORG
> Subject: Observations on our VM system problem
> 
> I wrote a test program to illustrate the canonical NetBSD VM system
> problem (vnode-backed VM pages can get out of sync with the filesystem
> and stay that way indefinitely).  ...
> ...
> Can anyone shed some light on this?  It could mean that vm_map_clean()
> is using incorrect logic and winds up cleaning the wrong objects, or
> it could mean that the vm structures are getting corrupt and there's a
> page corresponding to this vm object which simply isn't being cleaned.
> 

Ugh.  I know what the problem is.  You were pretty close with your first
guess about vm_map_clean().  It is cleaning/invalidating the wrong object,
or more exactly, not all the objects it needs to clean/invalidate.

When a MAP_PRIVATE mapping is created, it immediately sets up a shadow
object for the vnode object to hold all modified pages.  In your example,
there are no modified pages since you were only reading the mmap'ed data.
So you have:

shadow_object (resident_count==0) --shadows--> vnode_object (res_count==1)

and vm_map_clean is only cleaning/invalidating the first (shadow) object
where there are no pages, leaving the page (and pmap translation) for the
vnode object.

The rough solution isn't too bad, in vm_map_clean() you will have to
iterate over the shadow chain for the found object, cleaning/flushing as
you go.

If there is anyone from the FreeBSD camp on this list, you probably have
the same problem.  You won't see the stale data problem, since the buffer
page cache is unified.  However, you may still have the case where
msync(..., MS_INVALIDATE) doesn't actually invalidate all translations
in the range, only those whose pages exist in the top-level object.
Hence, you won't necessarily generate faults for all pages.