Subject: Re: MACH vm code problems (vm_fault doesn't consult underlying VFS)
To: None <rminnich@descartes.super.org>
From: Mike Hibler <mike@cs.utah.edu>
List: port-sparc
Date: 06/08/1994 12:55:12
[ Mike K. forwarded your message to me. ]

SYNOPSIS: Its all my fault! :-)

> Date: Fri, 3 Jun 1994 06:48:53 -0400 (EDT)
> From: Ronald G Minnich <rminnich@descartes.super.org>
> Subject: MACH vm code problems (vm_fault doesn't consult underlying VFS)
> To: port-sparc@sun-lamp.cs.berkeley.edu, bsdi-users@BSDI.COM
> 
> I'm trying to get a modified version of NFS, called MNFS, working under 
> bsdi/netbsd, and have run into a problem with the mach vm model. I'm 
> pondering options, and wanted to ping this knowledgeable community for 
> ideas. 
> ...
> 

Standard disclaimer #1: BSD VM != Mach VM.
The BSD VM was based on Mach 2.0, this was before (or about the same time)
that external pagers were added to Mach.  Mach 2.5 or above with the external
pager interface would allow you to do what you need to do.

> Here is where I run into the mach vm model. In the mach vm model, the 
> vm_pager acts as a page cache, and once it gets a page, it is willing to 
> satisfy read/write faults on that page *without consulting the underlying 
> vfs*. So, unlike (e.g.) AIX, sunos and solaris, it's not possible in the 
> current implementation to have the vm_pager come back to the VFS and say "I 
> asked for this readonly before; can I have it writeable now?". 
> 

Standard disclaimer #2: the BSD VM was only supposed to be a prototype.
(I hide behind this one a lot! :-) Ironically, the ability to consult the
pager when taking a write fault on a RO-page was even in the Mach 2.0 code
I based this on.  I ripped it out because it was only applicable (or so I
thought) to the external pager code and I didn't want to deal with external
pagers.  Something similar could be added back.  My first thought would be
to use the vm_pager_has_page() routine adding your faultinfo struct as a
parameter.  The only other use of that routine is as an optimization to
determine if a copy object already has a copy of a particular page (to avoid
a redundant copy).  You could pass a null faultinfo pointer in this case.
Or you could just define a new routine.

> In addition, for reasons I don't understand, the vm_pager doesn't pass any
> fault info to the underlying dev, vnode, or other pager code. We've found
> it handy for writing drivers for hardware we've built as well as for mnfs
> to have the faulting virtual address and the protections available to the
> vfs/driver code that actually handles the fault (which happens in sunos
> and solaris). 
> 

The reason is simple, the existing pagers didn't need that information.
I'm curious about your need for the faulting VA.  Do you need it just as
a tag to uniquely identify the correct backing store object and offset?
If so that info is already available to the pager via the pager struct
which (indirectly) provides the object and via the vm_page struct which
has the object offset.  Though I suppose that in a DSM, the VA is the most
convenient handle.  The reason I bring this up is that the Mach external
pager mechanism doesn't pass a VA, just the object/offset and DSMs have
been implemented using Mach.

> So, I'm trying to work up changes that make the fault code more useful 
> for us. First desired change is:
> struct faultinfo
>   {
>     caddr_t faulting_vaddr;
>     vm_prot_t  faulting_protections; /* shorten these to taste :-) */
>   };
> 
> faultinfo gets handed to the getpage function(s), i.e. it's an added 
> fourth parameter to the three that are there now. 
> 

This seems reasonable.  In the Mach world, faulting_vaddr would be a
vm_offset_t.  In the BSD world, the names would be shorter :-)

> Second change: in the initial step in vm_fault where it's looking up the
> page, have it consult the underlying pager (i.e. vfs/driver/etc.) to see
> if the pager thinks it has the page. This will give the VFS a chance to do
> what's needed. 
> 

This is what used to happen.  Put it after the "page busy" check.
Presumably this pager routine could block so you would have to be unlock
things and then start over after returning (ala the busy check).  Also,
need to busy out the page in question.

> These are preliminary thoughts on how to get mnfs to work on xxxbsd. I've
> got it working now on four different OS'es on three different vendor
> boxes. I can't ship any of that code, so I'd like to get it working on
> bsd. On one vendor's box I had to hack the page fault code specifically
> for mnfs, but I couldn't look in the mirror next morning (but those
> changes are probably there to stay). So I'd like to work out a way to do
> this that's acceptable to this community. Any thoughts would be welcome. 
> 

I don't speak for BSDI or NetBSD but I think your proposed changes are quite
acceptable.  If I get a chance I will try to put them into post-4.4lite.

------------------------------------------------------------------------------