tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Ownership of uvm_object and vm_page (was Re: XIP)
On Sat, Oct 30, 2010 at 06:55:42PM -0700, Chuck Silvers wrote:
> On Wed, Oct 27, 2010 at 06:38:11PM +0900, Masao Uebayashi wrote:
> > On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
> > > On Mon, Oct 25, 2010 at 02:09:43AM +0900, Masao Uebayashi wrote:
> > > > I think the uebayasi-xip branch is ready to be merged.
> > >
> > > hi,
> > >
> > > here's what I found looking at the current code in the branch:
> > >
> > >
> > > - the biggest issue I had with the version that I reviewed earlier
> > > was that it muddled the notion of a "managed" page. you wanted
> > > to create a new kind of partially-managed page for XIP devices
> > > which would not be managed by the UVM in the sense that it could
> > > contain different things depending on what was needed but only that
> > > the page's mappings would be tracked so that pmap_page_protect()
> > > could be called on it. this is what led to all the pmap changes
> > > the pmaps needed to be able to handle being called with a vm_page
> > > pointer that didn't actually point to a struct vm_page.
> > >
> > > it looks like you've gotten rid of that, which I like, but you've
> > > replaced it with allocating a full vm_page structure for every page in
> > > an XIP device, which seems like a waste of memory. as we discussed
> > > earlier, I think it would be better to treat XIP pages as unmanaged
> > > and change a few other parts of UVM to avoid needing to track the
> > > mappings of XIP page mappings. I have thoughts on how to do all that,
> > > which I'll list at the end of this mail. however, if XIP devices
> > > are small enough that the memory overhead of treating device pages
> > > as managed is reasonable, then I'll go along with it.
> > > so how big do these XIP devices get?
> >
> > It's waste of memory, yes. With 64M ROM on arm (4K page, 80byte vm_page),
> > the array is 1.25M. If vm_page's made a single pointer (sizeof(void
> > *) == 4), the array size becomes 64K. Not small difference.
> > Typical XIP'ed products would be mobile devices with FlashROM, or small
> > servers with memory disk (md or xmd). About 16M~1G RAM/ROM?
> >
> > I made it back to have vm_page to simplify code. We can make it
> > to vm_page_md or whatever minimal, once after we figure out the
> > new design of "MI vm_page_md".
> >
> > > either way, the changes to the various pmaps to handle the fake vm_page
> > > pointers aren't necessary anymore, so all those changes should be
> > > reverted.
> >
> > Those mechanical vm_page -> vm_page_md changes done in pmaps have
> > a valid point by itself. Passing around vm_page pointer across
> > pmap functions is unnecessary. I'd rather say wrong. All pmap
> > needs is vm_page_md.
> >
> > I'd propose to do this vm_page -> vm_page_md change in pmaps first
> > in HEAD and sync the branch with it, rather than revert it.
>
> that seems a bit premature, don't you think?
> since you're already talking about redesigning it?
>
> ... apparently not, you already checked it in.
>
>
> > > it doesn't look like the PMAP_UNMANAGED flag that you added before
> > > is necessary anymore either, is that right? if so, it should also
> > > be reverted. if not, what's it for now?
> >
> > pmap_enter() passes paddr_t directly to pmap. pmap has no clue if
> > the given paddr_t is to be cached/uncached. So a separate flag,
> > PMAP_UNMANAGED. Without it, a FlashROM which is registered as
> > (potentially) managed (using bus_space_physload_device) is always
> > mapped to user address as cached, even if it's not XIP. This made
> > userland flash writer program not work.
> >
> > The point is whether to be cached or not is decided by virtual
> > address. Thus such an information should be stored in vm_map_entry
> > explicitly, in theory.
> >
> > (This is related to framebuffers too.)
>
> there is already an MI mechanism for specifying cachability of mappings,
> see PMAP_NOCACHE and related flags.
>
>
> > > - I mentioned before that I think the driver for testing this using
> > > normal memory to simulate an XIP device should just be the existing
> > > "md" driver, rather than a whole new driver whose only purpose
> > > would be testing the XIP code. however, you wrote a separate
> > > "xmd" driver anyway. so I'll say again: I think the "xmd" should be
> > > merged into the "md" driver.
> >
> > It has turned out that the new xmd(4) is not really md(4); it's not
> > md(4) modified to support XIP
> > but rather
> > rom(4) emulated using RAM
> >
> > I plan to merge xmd(4) into rom(4) or flash(4) when it's ready.
> > If you don't like xmd(4) to reserve dev namespace, I'd rather back
> > out xmd(4) totally.
> >
> > If you don't like the name, suggest a better one. ;)
>
> I'll respond to this in the other thread you started about it.
> but creating a temporary device driver does seem pretty silly.
>
>
> > > you also have an "xmd_machdep.c" for various platforms, but nothing
> > > in that file is really machine-specific. rather than use the
> > > machine-specific macros for converting between bytes and pages,
> > > it would be better to either use the MI macros (ptoa / atop)
> > > or just shift by PAGE_SHIFT, so that there doesn't need to be
> > > extra code written for each platform.
> >
> > Well, xmd_machdep.c is not really xmd(4)-specific. It *is*
> > machine-specific. It does vtophys(), not good put in MI part.
> >
> > ... and now I realize xmd_machdep_* should be renamed as pmap_*.
> > I'll fix this.
>
> and this in the other thread too.
>
>
> > > - as we discussed before, the addition of bus_space_physload() and
> > > related
> > > interfaces seems very strange to me, especially since the two
> > > implementations
> > > you've added both do exactly the same thing (call bus_space_mmap() on
> > > the
> > > start and end of the address range, and then call uvm_page_physload()
> > > with the resulting paddr_t values). is there any reason this can't be
> > > a MI function? also, the non-device versions of these are unused.
> >
> > I'll move these code to sys/common.
>
> the location in the source tree isn't the weird part.
> but a common implementation is an improvement.
>
>
> > bus_space_physload(9) will be used when a driver wants to register
> > its memory as a general purpose memory (linked into free list).
> > Think PCI RAM over bridges. I've not written a driver to do this,
> > but it's trivial.
> >
> > We also have to fix uvm_page_physload() so that it can be called
> > at run-time, not only boot-time. But this is beyond this branch.
> >
> > > - in many files the only change you made was to replace the include of
> > > <uvm/uvm_extern.h> with <uvm/uvm.h>, what's the reason for that?
> >
> > To reduce rebuild time when I worked on uvm_page.[ch]. Basically
> > uvm_page.h is internal to UVM. Touching it outside of UVM is simply
> > wrong.
> >
> > I'll put it back before the merge, to avoid potential build failures.
>
> thanks.
>
>
> > > - we talked before about removing the "xip" mount option and enabling
> > > this functionality automatically when it's available, which you did
> > > but then recently changed back. so I'll ask again,
> > > why do we need a mount option?
> >
> > I put it back because otherwise I can't know if a mount is XIP or
> > not explicitly. Mount option is the only way to know mount condition.
> > This is more a VFS/mount design problem.
>
> is your desire to control whether the mount accesses the device via mappings,
> or just to be able to see whether or not the device is being accessed via
> mappings?
>
>
> > > - as we've discussed before, I think that the XIP genfs_getpages()
> > > should be merged with the existing one before you merge this into
> > > -current. merging it as-is would make a mess, and it's better to
> > > avoid creating messes than than to promise to clean them up later.
> > > we've probably already spent more time arguing about it than
> > > it would have taken you to do it.
> > >
> > > - similarly, I don't think that we should have a separate XIP putpages.
> > > why doesn't the existing genfs_do_putpages() work for XIP pages as-is?
> >
> > This part I'm not convinced. Your argument can be hardly an excuse
> > for you to not work on merging genfs_getpages() and genfs_do_io(). ;)
> >
> > If you *really* want me to work on this before the merge (which I
> > do hope not), I'd create a separate branch to refactor genfs totally
> > into smaller pieces as I did for uvm_fault.c. If you trust my
> > refactoring-fu.
>
> well, it's been about a decade since I wrote this code so my memory is
> a bit hazy as to how it ended up that way, but from looking at the cvs
> history a bit just now, I think that the getpages and putpages paths
> started off completely separate, and when I added the directio code,
> I split genfs_do_io() out of the putpages code and called it from both.
> yes, it would have been good to merge the getpages version at that point
> too, but there wasn't a point where I took a unified code path and made
> a copy of it.
>
> and I'm not giving any argument for not merging these functions,
> I'm saying that their existance is not a good excuse to make another copy.
>
> to try to answer my own question about why you have a separate putpages,
> I guess that's to handle the page of zeroes that you have attached to
> each vnode. without the per-vnode page of zeroes, the existing
> putpages would work fine.
I've understood this problem. What's going wrong here is:
1) UVM assumes that vm_page is owned by single uvm_object, and
2) UVM assumes that vm_page is the page identity.
To correct these design problems, we'd have to do like:
LIST_HEAD(vm_owners, vm_owner);
struct uvm_object {
struct vm_owners *vm_owner_list_head;
struct rb_node vm_owner_rb_node;
};
struct vm_owner {
LIST_ENTRY(vm_ownder) uoe_list_entry;
struct rb_node *uoe_rb_entry;
struct uvm_object *uoe_uobj;
off_t uoe_offset;
struct vm_id *uoe_id;
};
struct vm_id {
struct vm_physseg *vi_phys;
off_t vi_offset;
};
struct vm_physseg {
struct vm_page **pgs;
:
};
struct vm_page {
:
};
This means that a vm_page can be shared by multiple uvm_objects.
This design is natural. When a user copies a file, just allocate
the "owner of vm_page" part, link them into the new vnode. The
vm_owner holds the data that what offset it exists in its parent
uvm_object.
This design also answers how to share a single zero page among all
vnodes, and why we don't support O->O loan. We already have
PG_RDONLY, so I don't see any reason why we can't achieve this.
Home |
Main Index |
Thread Index |
Old Index