Re: XIP

To: Masao Uebayashi <uebayasi%tombi.co.jp@localhost>
Subject: Re: XIP
From: Chuck Silvers <chuq%chuq.com@localhost>
Date: Sat, 30 Oct 2010 18:55:42 -0700
On Wed, Oct 27, 2010 at 06:38:11PM +0900, Masao Uebayashi wrote:
> On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
> > On Mon, Oct 25, 2010 at 02:09:43AM +0900, Masao Uebayashi wrote:
> > > I think the uebayasi-xip branch is ready to be merged.
> > 
> > hi,
> > 
> > here's what I found looking at the current code in the branch:
> > 
> > 
> >  - the biggest issue I had with the version that I reviewed earlier
> >    was that it muddled the notion of a "managed" page.  you wanted
> >    to create a new kind of partially-managed page for XIP devices
> >    which would not be managed by the UVM in the sense that it could
> >    contain different things depending on what was needed but only that
> >    the page's mappings would be tracked so that pmap_page_protect()
> >    could be called on it.  this is what led to all the pmap changes
> >    the pmaps needed to be able to handle being called with a vm_page
> >    pointer that didn't actually point to a struct vm_page.
> > 
> >    it looks like you've gotten rid of that, which I like, but you've
> >    replaced it with allocating a full vm_page structure for every page in
> >    an XIP device, which seems like a waste of memory.  as we discussed
> >    earlier, I think it would be better to treat XIP pages as unmanaged
> >    and change a few other parts of UVM to avoid needing to track the
> >    mappings of XIP page mappings.  I have thoughts on how to do all that,
> >    which I'll list at the end of this mail.  however, if XIP devices
> >    are small enough that the memory overhead of treating device pages
> >    as managed is reasonable, then I'll go along with it.
> >    so how big do these XIP devices get?
> 
> It's waste of memory, yes.  With 64M ROM on arm (4K page, 80byte vm_page),
> the array is 1.25M.  If vm_page's made a single pointer (sizeof(void
> *) == 4), the array size becomes 64K.  Not small difference.
> Typical XIP'ed products would be mobile devices with FlashROM, or small
> servers with memory disk (md or xmd).  About 16M~1G RAM/ROM?
> 
> I made it back to have vm_page to simplify code.  We can make it
> to vm_page_md or whatever minimal, once after we figure out the
> new design of "MI vm_page_md".
> 
> >    either way, the changes to the various pmaps to handle the fake vm_page
> >    pointers aren't necessary anymore, so all those changes should be 
> > reverted.
> 
> Those mechanical vm_page -> vm_page_md changes done in pmaps have
> a valid point by itself.  Passing around vm_page pointer across
> pmap functions is unnecessary.  I'd rather say wrong.  All pmap
> needs is vm_page_md.
> 
> I'd propose to do this vm_page -> vm_page_md change in pmaps first
> in HEAD and sync the branch with it, rather than revert it.

that seems a bit premature, don't you think?
since you're already talking about redesigning it?

... apparently not, you already checked it in.


> >    it doesn't look like the PMAP_UNMANAGED flag that you added before
> >    is necessary anymore either, is that right?  if so, it should also
> >    be reverted.  if not, what's it for now?
> 
> pmap_enter() passes paddr_t directly to pmap.  pmap has no clue if
> the given paddr_t is to be cached/uncached.  So a separate flag,
> PMAP_UNMANAGED.  Without it, a FlashROM which is registered as
> (potentially) managed (using bus_space_physload_device) is always
> mapped to user address as cached, even if it's not XIP.  This made
> userland flash writer program not work.
> 
> The point is whether to be cached or not is decided by virtual
> address.  Thus such an information should be stored in vm_map_entry
> explicitly, in theory.
> 
> (This is related to framebuffers too.)

there is already an MI mechanism for specifying cachability of mappings,
see PMAP_NOCACHE and related flags.


> >  - I mentioned before that I think the driver for testing this using
> >    normal memory to simulate an XIP device should just be the existing
> >    "md" driver, rather than a whole new driver whose only purpose
> >    would be testing the XIP code.  however, you wrote a separate
> >    "xmd" driver anyway.  so I'll say again: I think the "xmd" should be
> >    merged into the "md" driver.
> 
> It has turned out that the new xmd(4) is not really md(4); it's not
>       md(4) modified to support XIP
> but rather
>       rom(4) emulated using RAM
> 
> I plan to merge xmd(4) into rom(4) or flash(4) when it's ready.
> If you don't like xmd(4) to reserve dev namespace, I'd rather back
> out xmd(4) totally.
> 
> If you don't like the name, suggest a better one. ;)

I'll respond to this in the other thread you started about it.
but creating a temporary device driver does seem pretty silly.


> >    you also have an "xmd_machdep.c" for various platforms, but nothing
> >    in that file is really machine-specific.  rather than use the
> >    machine-specific macros for converting between bytes and pages,
> >    it would be better to either use the MI macros (ptoa / atop)
> >    or just shift by PAGE_SHIFT, so that there doesn't need to be
> >    extra code written for each platform.
> 
> Well, xmd_machdep.c is not really xmd(4)-specific.  It *is*
> machine-specific.  It does vtophys(), not good put in MI part.
> 
> ... and now I realize xmd_machdep_* should be renamed as pmap_*.
> I'll fix this.

and this in the other thread too.


> >  - as we discussed before, the addition of bus_space_physload() and related
> >    interfaces seems very strange to me, especially since the two 
> > implementations
> >    you've added both do exactly the same thing (call bus_space_mmap() on the
> >    start and end of the address range, and then call uvm_page_physload()
> >    with the resulting paddr_t values).  is there any reason this can't be
> >    a MI function?   also, the non-device versions of these are unused.
> 
> I'll move these code to sys/common.

the location in the source tree isn't the weird part.
but a common implementation is an improvement.


> bus_space_physload(9) will be used when a driver wants to register
> its memory as a general purpose memory (linked into free list).
> Think PCI RAM over bridges.  I've not written a driver to do this,
> but it's trivial.
> 
> We also have to fix uvm_page_physload() so that it can be called
> at run-time, not only boot-time.  But this is beyond this branch.
> 
> >  - in many files the only change you made was to replace the include of
> >    <uvm/uvm_extern.h> with <uvm/uvm.h>, what's the reason for that?
> 
> To reduce rebuild time when I worked on uvm_page.[ch].  Basically
> uvm_page.h is internal to UVM.  Touching it outside of UVM is simply
> wrong.
> 
> I'll put it back before the merge, to avoid potential build failures.

thanks.


> >  - we talked before about removing the "xip" mount option and enabling
> >    this functionality automatically when it's available, which you did
> >    but then recently changed back.  so I'll ask again,
> >    why do we need a mount option?
> 
> I put it back because otherwise I can't know if a mount is XIP or
> not explicitly.  Mount option is the only way to know mount condition.
> This is more a VFS/mount design problem.

is your desire to control whether the mount accesses the device via mappings,
or just to be able to see whether or not the device is being accessed via
mappings?


> >  - as we've discussed before, I think that the XIP genfs_getpages()
> >    should be merged with the existing one before you merge this into
> >    -current.  merging it as-is would make a mess, and it's better to
> >    avoid creating messes than than to promise to clean them up later.
> >    we've probably already spent more time arguing about it than
> >    it would have taken you to do it.
> > 
> >  - similarly, I don't think that we should have a separate XIP putpages.
> >    why doesn't the existing genfs_do_putpages() work for XIP pages as-is?
> 
> This part I'm not convinced.  Your argument can be hardly an excuse
> for you to not work on merging genfs_getpages() and genfs_do_io(). ;)
> 
> If you *really* want me to work on this before the merge (which I
> do hope not), I'd create a separate branch to refactor genfs totally
> into smaller pieces as I did for uvm_fault.c.  If you trust my
> refactoring-fu.

well, it's been about a decade since I wrote this code so my memory is
a bit hazy as to how it ended up that way, but from looking at the cvs
history a bit just now, I think that the getpages and putpages paths
started off completely separate, and when I added the directio code,
I split genfs_do_io() out of the putpages code and called it from both.
yes, it would have been good to merge the getpages version at that point
too, but there wasn't a point where I took a unified code path and made
a copy of it.

and I'm not giving any argument for not merging these functions,
I'm saying that their existance is not a good excuse to make another copy.

to try to answer my own question about why you have a separate putpages,
I guess that's to handle the page of zeroes that you have attached to
each vnode.  without the per-vnode page of zeroes, the existing
putpages would work fine.


> >  - in getpages, rather than allocating a page of zeros for each file,
> >    it would be better to use the same page of zeroes that mem.c uses.
> >    if multiple subsystems need a page of zeroes, then I think that
> >    UVM should provide this functionality rather than duplicating
> >    that all over the place.  if you don't want to update all the copies
> >    of mem.c, that's fine, but please put this in UVM instead of genfs.
> 
> I made it per-vnode, because If you share a single physical zero'ed
> page, when a vnode is putpage'ed, it invalidates all mappings in
> other vnodes pointing to the zero page in a pmap.  Doesn't this
> sound strange?

I think you're misunderstanding something.  for operations like
msync(), putpage is called on the object in the UVM map entry,
which would never be the object owning the global page of zeroes.
from the pagedaemon, putpage is called on a page's object,
but if the page replacement policy decides to reclaim the page
of zeroes then invalidating all the mappings is necessary.
if what you describe would actually happen, then yes,
I would find it strange.


> Another reason is that by having a vm_page with a uvm_object as
> its parent, fault handler doesn't need to know anything about the
> special zero'ed page.

the fault handler wouldn't need to know about the page of zeroes,
why do you think it would?  it does need to be aware that the pages
returned by getpages may not be owned by the object that getpages
is called with, but it needs to handle that regardless, for layered
file systems.


> >  - I don't remember if I asked about this before: is there some reason
> >    for there to be XIP code in the fs-specific code?  couldn't this be
> >    done in the VFS and vnode code instead so that it works for all
> >    file system types instead of just FFS?
> 
> This is due to the current VFS design.  I agree that fs-neutral part
> should be moved to common part.  It's just beyond this branch...

could you explain what about the VFS design makes this hard?

I guess I mentioned this mainly because in your earlier mail
it sounded like this feature would be available for all fs types.


> >  - as you guessed in your mail, I don't like the new PQ_FIXED flag either.
> >    it looks like it's only used for preventing the XIP pages from
> >    being put on the paging queues, but that would be better done by
> >    just initializing pg->wire_count to 1 so that UVM treats them
> >    as permanently non-pageable (which they actually are).
> 
> Are you OK to prevent XIP pages from being loaned? ;)
> 
> The actual problems here are
> 
>  a) the generic fault handler doesn't know how to handle pages that
>     never be paged out (device pages), and
> 
>  b) uvm loan uses vm_page.
> 
> You wanted device pages to be able to be loaned.  I agree the idea,
> but I'm feeling like that the way the loan "state" is kept
> (vm_page::loan_count) is wrong.  I've not figured out how to address
> this yet.

I've thought about loaning and XIP again quite a bit, but in the interest
of actually sending this mail today, I'll leave this part for later too.

-Chuck
Follow-Ups:
- Ownership of uvm_object and vm_page (was Re: XIP)
  - From: Masao Uebayashi
- Re: XIP
  - From: Mindaugas Rasiukevicius
- Re: XIP
  - From: Masao Uebayashi
References:
- XIP
  - From: Masao Uebayashi
- Re: XIP
  - From: Chuck Silvers
- Re: XIP
  - From: Masao Uebayashi
Prev by Date: Re: Fixes for kern/40018 -- any chance of getting these pulled into the -current and 5.x trees?
Next by Date: Re: XIP
Previous by Thread: Re: pmap_extract(9) (was Re: xmd(4) (Re: XIP))
Next by Thread: Re: XIP
Indexes:
Home | Main Index | Thread Index | Old Index