Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: aarch64 performance tweaks



(continued)

>- Have unlikely conditional branches go forwards to help the static branch
>  predictor.

This is very interesting, but I wonder how effective it is in practice.
The function itself is small enough that it may be more effective to align functions to cachesize.
(we might want to try CFLAGS+=-falign-functions=... and increase _ALIGN_TEXT)


>- Use tpidr_el1 to hold curlwp and not curcpu, because curlwp is accessed
>  much more often by MI code.  It also makes curlwp preemption safe and
>  allows aarch64_curlwp() to be a const function (curcpu must be volatile).

BTW, n/aarch64 uses tpidr_*el0* as userland's TLS, and it was saved to l_private at excption and restored from l_private at eret.
Therefore, tpidr_el0 can be used freely in the kernel context, we might be able to use it as curcpu.
( Of course, to do this we need to "tpidr_el0 = curlwp->l_cpu" in el0_trap and lwp_trampoline )
However, I'm not sure how effective this is compared to using curlwp->l_cpu... :-P


Others, looks so good to me. Thank you for a great job!
-- 
ryo shimizu


Home | Main Index | Thread Index | Old Index