tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/sys/uvm



On Mon, Jan 03, 2011 at 10:28:17PM -0800, Matt Thomas wrote:
> 
> On Jan 3, 2011, at 9:01 PM, Masao Uebayashi wrote:
> 
> > I take silence as "no objection".
> 
> Not so safe.
> 
> > On Thu, Dec 23, 2010 at 12:48:04PM +0900, Masao Uebayashi wrote:
> >> On Wed, Dec 22, 2010 at 05:37:57AM +0000, YAMAMOTO Takashi wrote:
> >>> hi,
> >>> 
> >>>> Could you ack this discussion?
> >>> 
> >>> sorry for dropping a ball.
> >>> 
> >>>> 
> >>>> On Tue, Dec 07, 2010 at 01:19:46AM +0900, Masao Uebayashi wrote:
> >>>>> On Thu, Nov 25, 2010 at 11:32:39PM +0000, YAMAMOTO Takashi wrote:
> >>>>>> [ adding cc: tech-kern@ ]
> >>>>>> 
> >>>>>> hi,
> >>>>>> 
> >>>>>>> On Wed, Nov 24, 2010 at 11:26:39PM -0800, Matt Thomas wrote:
> >>>>>>>> 
> >>>>>>>> On Nov 24, 2010, at 10:47 PM, Masao Uebayashi wrote:
> >>>>>>>> 
> >>>>>>>>> On Thu, Nov 25, 2010 at 05:44:21AM +0000, YAMAMOTO Takashi wrote:
> >>>>>>>>>> hi,
> >>>>>>>>>> 
> >>>>>>>>>>> On Thu, Nov 25, 2010 at 04:18:25AM +0000, YAMAMOTO Takashi wrote:
> >>>>>>>>>>>> hi,
> >>>>>>>>>>>> 
> >>>>>>>>>>>>> Hi, thanks for review.
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> On Thu, Nov 25, 2010 at 01:58:04AM +0000, YAMAMOTO Takashi 
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> hi,
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> - what's VM_PHYSSEG_OP_PG?
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> It's to lookup vm_physseg by "struct vm_page *", relying on that
> >>>>>>>>>>>>> "struct vm_page *[]" is allocated linearly.  It'll be used to 
> >>>>>>>>>>>>> remove
> >>>>>>>>>>>>> vm_page::phys_addr as we talked some time ago.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> i'm not sure if commiting this unused uncommented code now helps 
> >>>>>>>>>>>> it.
> >>>>>>>>>>>> some try-and-benchmark cycles might be necessary given that
> >>>>>>>>>>>> vm_page <-> paddr conversion could be performace critical.
> >>>>>>>>>>> 
> >>>>>>>>>>> If you really care performance, we can directly pass "struct 
> >>>>>>>>>>> vm_page
> >>>>>>>>>>> *" to pmap_enter().
> >>>>>>>>>>> 
> >>>>>>>>>>> We're doing "struct vm_page *" -> "paddr_t" just before 
> >>>>>>>>>>> pmap_enter(),
> >>>>>>>>>>> then doing "paddr_t" -> "vm_physseg" reverse lookup again in
> >>>>>>>>>>> pmap_enter() to check if a given PA is managed.  What is really
> >>>>>>>>>>> needed here is, to lookup "struct vm_page *" -> "vm_physseg" once
> >>>>>>>>>>> and you'll know both paddr_t and managed or not.
> >>>>>>>>>> 
> >>>>>>>>>> i agree that the current code is not ideal in that respect.
> >>>>>>>>>> otoh, i'm not sure if passing vm_physseg around is a good idea.
> >>>>>>>>> 
> >>>>>>>>> It's great you share the interest.
> >>>>>>>>> 
> >>>>>>>>> I chose vm_physseg, because it was there.  I'm open to alternatives,
> >>>>>>>>> but I don't think you have many options...
> >>>>>>>> 
> >>>>>>>> Passing vm_page * doesn't work if the page isn't managed since there
> >>>>>>>> won't be a vm_page for the paddr_t.
> >>>>>>>> 
> >>>>>>>> Now passing both paddr_t and vm_page * would work and if the pointer
> >>>>>>>> to the vm_page, it would be an unmanaged mapping.  This also gives 
> >>>>>>>> the
> >>>>>>>> access to mdpg without another lookup.
> >>>>>>> 
> >>>>>>> What if XIP'ed md(4), where physical pages are in .data (or .rodata)?
> >>>>>>> 
> >>>>>>> And don't forget that you're the one who first pointed out that
> >>>>>>> allocating vm_pages for XIP is a pure waste of memory. ;)
> >>>>>> 
> >>>>>> i guess matt meant "if the pointer to the vm_page is NULL,".
> >>>>>> 
> >>>>>>> 
> >>>>>>> I'm allocating vm_pages, only because of phys_addr and loan_count.
> >>>>>>> I believe vm_pages is unnecessary for read-only XIP segments.
> >>>>>>> Because they're read-only, and stateless.
> >>>>>>> 
> >>>>>>> I've already concluded that the current "managed or not" model
> >>>>>>> doesn't work for XIP.  I'm pretty sure that my vm_physseg + off_t
> >>>>>>> model can explain everything.  I'm rather waiting for a counter
> >>>>>>> example how vm_physseg doesn't work...
> >>>>>> 
> >>>>>> i guess your suggestion is too vague.
> >>>>>> where do you want to use vm_physseg * + off_t instead of vm_page * ?
> >>>>>> getpages, pmap_enter, and?  how their function prototypes would be?
> >>>>> 
> >>>>> The basic idea is straightforward; always allocate vm_physseg for
> >>>>> memories/devices.  If a vm_physseg is used as general purpose
> >>>>> memory, you allocate vm_page[] (as vm_physseg::pgs).  If it's
> >>>>> potentially mapped as cached, you allocate pvh (as vm_physseg:pvh).
> >>> 
> >>> can you provide function prototypes?
> >> 
> >> I have no real code for this big picture at this moment.  Making
> >> vm_physseg available as reference is the first step.  This only
> >> changes uvm_page_physload() to return a pointer:
> >> 
> >>    -void uvm_page_physload();
> >>    +void *uvm_page_physload();
> >> 
> >> But this makes XIP pager MUCH cleaner.  The reason has been explained
> >> many times.
> >> 
> >> Making fault handlers and pagers to use vm_physseg * + off_t is
> >> the next step, and I don't intend to work on it now.  I just want
> >> to explain the big picture.
> >> 
> >>> 
> >>>>> 
> >>>>> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> >>>>> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> >>>>> + off_t *once* and cache it on stack.
> >>> 
> >>> do you mean something like this?
> >>>   struct {
> >>>           vm_physseg *hoge;
> >>>           off_t fuga;
> >>>   } foo [16];
> >> 
> >> Yes.
> >> 
> >> Or cache vm_page * with it, like:
> >> 
> >>    struct vm_id {
> >>            vm_physseg *seg;
> >>            off_t off;
> >>            vm_page *pg;
> >>    };
> >> 
> >>    uvm_fault()
> >>    {
> >>            vm_id pgs[];
> >>            :
> >>    }
> >> 
> >> Vnode pager (genfs_getpages) takes vm_page's by looking up
> >> vnode::v_uobj's list, or uvn_findpages().
> >> 
> >> When it returns back to fault handler, we have to lookup vm_physseg
> >> for each page.  Then fill the "seg" slot above (assuming we'll
> >> remove vm_page::phys_addr soon).
> >> 
> >> Fault handler calls per-vm_page operations iff vm_page slot is filled.
> >> XIP pages are not pageq'ed.  XIP pages don't need vm_page, but
> >> cached because it's vnode.
> >> 
> >> (Just in case, have you read my paper?)
> 
> Still on my TODO list.  I've been preparing to merge the mips64 branch the my 
> mpc85xx code.
> 
> One thing I've been thinking about is how to use an MMU's capability to map 
> >1 page with a single TLB entry or PTE.

This would be easy for cdevs if vm_physseg is allocated; cdev device
pager returns the vm_physseg back to the fault handler.  UVM can
know if the faulting page is continugous physically or virtually
(vm_map_entry).  If both match, tell pmap to use larger TLB.

> 
> If we are going with vm_physseg as a fundamental VM data structure, maybe we 
> should think it about how to use it a little more.
> 
> should it have a 
> 
> struct vm_page *(*ps_off2pg)(struct vm_physseg *, u_long frame_no);
> struct vm_page_md *(*ps_off2md)(struct vm_physseg *, u_long frame_no);
> 
> or similar functions?

Typical vnode lookup done in fault handling is:
        vaddr_t -> vm_map_entry -> uobj -> vm_page -> vm_physseg -> vm_page_md

As I already explained many times, you need only 2 kind of lookups:
        vm_page -> vm_physseg + off
        vm_physseg + off -> vm_page_md (pvh)


Home | Main Index | Thread Index | Old Index