Subject: Re: Sheesh. More LKMs. And some firewall stupidity.
To: Henry B. Hotz <hotz@jpl.nasa.gov>
From: Bill Studenmund <wrstuden@zembu.com>
List: port-macppc
Date: 08/21/2000 13:55:02
On Mon, 21 Aug 2000, Henry B. Hotz wrote:

> At 3:12 AM -0400 8/19/00, gabriel rosenkoetter wrote:
> >On 08/16/2000 16:24:57, Henry B. Hotz <hotz@jpl.nasa.gov> wrote:
> >
> > > There is already support for swapping to a vnode filesystem.  Sounds
> > > like what you should be doing is defining the vnode calls to swap
> > > page data, not modifying UVM itself.
> 
> >As I understand it UVM won't use the vnode pager for standard memory,
> >it uses aobj/uvm_obj pairs for that, and making it use something else
> >would be more work than what I'm thinking of.

Yes, but as I understand it, those mappings are made at exec time. Your
migration will be a special kind of "exec", so you get to make those
mappings. (I assume) you can make a processes's entire address space be
backed by "vnodes". :-)

> >Also, I don't see how a vnode FS (which exists as a file in the local
> >file system, no?) helps me get pages of data into the live memory of
> >another machine on the network.
> 
> Hmmm...  Perhaps I didn't think this all the way through.  My 
> starting point was the problem Bill S mentioned about the difficulty 
> of replacing existing routines and the fact that the kernel has a 
> mechanism for adding routines to support new filesystems.
> 
> The problem is that the latter is not *directly* a mechanism for 
> reading/writing blocks of memory from an external device.  Thinking 
> about it for about 10 seconds more than I did the first time I'd say 
> there are two approaches you could try:  1) you could use an LKM to 
> add a device driver that UVM could use to access the kind of external 
> memory you want, or 2) you could use an LKM to add a new filesystem 
> type which has whatever "side effects" you want for how the data is 
> actually stored.  Depending on how picky you are about what the data 
> looks like on the external machine this could get rather complex, and 
> might wind up looking a lot like an NFS mount of another machine's 
> memory file system.  Except that you would want to do something 
> special with the directory/inode structure to reduce overhead. 
> *More* complexity.

Note: this is a little bit of the blind leading the blind. I know
filesystems quite well, but not so much on the vm system. :-)

About adding a device: the way you add a device to the vm system AFAIK is
by adding swap. Swap might get used by any process, so it isn't what you
want.

You don't need an NFS-looking file system. File systems have two parts,
the name space and the backing store (the part that keeps the file in
blocks on the disk). You only need a special form of the latter. There is
no need for "your" vnodes to exist in a name space - no one will ever get
to one of them via an open() system call. :-) So half of the complexity of
a file system is gone. Then you're only doing a subset of things in your
reading & writing, so you don't need much in terms of backing store
complexity.

You will need a mount point for your file system, as a lot of the vfs
system assumes there's a mount point around. :-) But I'd suggest hacking
kernfs to do it. Just teach kernfs about a new kind of vnode type (add to
KTT_INT, etc.), but don't add that vnode type to any of the tables. That
way it's not in the name space.

You could even give these vnodes a different set of operation vectors.
That way they are totally independent of the other kernfs ones. Your
routines and called only from these vnodes, and these vnodes call only
your routines.

Then figure out how to allocate them & deallocate them. Allocation will
happen when your routines are doing some exec thing, and deallocation will
happen in the vop_reclaim routine, which happens when your vnode is
getting reused. You can also reclaim in the vop_inactivate routine as
these vnodes don't persist across exec's (i.e. once you're done with a
vnode, you won't need it again for the same process).

Then figure out what information you need to keep to talk to the other
machine, and how to transfer blocks. In each vnode (in your private
storage off to the side), you store this info. That way when your read or
write routine gets accessed, you have enough info to get the block off of
or send over the net. :-)

> Well, that's it.  You can only add *new* stuff, not modify existing 
> stuff unless it has specific hooks for it.  This problem is not 
> unique to NetBSD.  The DS1 spacecraft code (which uses the VxWorks 
> real-time kernel) has the same problem.  Solving that problem is what 
> ld is for, and ld wasn't built to un-ld. /-(
> 
> If anyone knows of a general solution to the problem please let me 
> know.  I might even get a bonus.

You'd have to modify existing references. You'd have to stop the kernel
HARD, make pages r/w, modify references, resotre protection, and resume.
Kinda scary.

> At 12:57 AM -0400 8/20/00, gabriel rosenkoetter wrote:
> >And further, from vnd(4):
> >
> >BUGS
> >     The vnd driver does not work if the file does not reside in a local
> >     filesystem.
> >
> >Definitely not making me think I'm interested in the vnode pager..

I'm not sure exactly what the problem is, but it's more an NFS and vnd
interaction. You need to look at the memory starvation issues which vnd
can hit (if you have all direty pages and you need clean pages to get rid
of the dirty pages, you loose bad), but that's about it.

> So scratch my NFS to MFS idea.  Does this mean that NetBSD doesn't 
> support diskless machines at all?

We have to get around spl issues somehow. I'm not sure how, but we do
support swap over nfs.

Take care,

Bill