[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: i386 lazy pmap switching in trap.c
On Sun, Feb 10, 2008 at 12:29:34AM +0900, YAMAMOTO Takashi wrote:
> > >
> > > the motivation of onfault_handler() was the opposite; it was to optimize
> > > copyin/out by eliminating the "mark the lwp", which is only necessary for
> > > rare cases like preemption/faults in the middle of operations.
> > Well, since copyin/out need a check to detect whether the pmap needs
> > to be loaded, the same check could 'lock' the pmap active for that lwp.
> > None of these checks would need any locking.
> when i did the change, i did some micro-benchmarks. i forgot details,
> but iirc it showed small differences.
I've been doing some today as well.
The values I'm getting a rather ramdomised (I think by cache displacement
effects) - especially since the effects of changing a branch or two
isn't that great in intself (and I'm not using a P4-netburst where you
really don't want it to mispredict a branch).
> > I was also thinking that in-kernel pre-emption would happen at any
> > place where it isn't forbidden. So it would look just like (almost)
> > any other context switch, which means you would be doing
> > 'pmap_load_if_necessary()' in the normal lwp resume path - that path
> > doesn't want to be scanning a list of addresses...
> yes, the list is expected to be small enough.
> if it matters, we can arrange copy.S so that the list have only an entry.
> what do you mean by "the normal lwp resume path"? i don't think
> it's desirable to do it in cpu_switchto().
The value that indicates whether the pmap is valid could have 3 values:
-1 => pmap is invalid (ie is that of a different process)
0 => pmap is valid
1 => pmap valid and must be preserved
(it is a per-lwp value, but could be cached per-cpu, and only changeable
by the lwp itself)
copyin/out (etc) would increment the value on entry, if the result is 0
then call the pmap_load() - which would set it to 1 before returning.
at the end of copyin the value is decremented/set to zero.
Return to userspace could do much the same, relinquishing the pmap
inside the trap/interrupt handler.
The decrement on exit would be cheaper than the loop around pmap_load.
(I've just got a measurable performance gain from removing the loop.)
> > The pcb_onfault path is hardly ever executed, AFAICT the lookup is only
> > done in order to decide whether to return EFAULT or to panic.
> > ie it isn't done in the paths that page in user memory.
> i don't understand what you mean by this paragraph.
> are you just explaining the code?
Sort of, but pointing out that the peformance of the lookup doesn't
matter in that case.
David Laight: david%l8s.co.uk@localhost
Main Index |
Thread Index |