Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Increases in build system time

On 11/15/19, Andreas Gustafsson <> wrote:
> Mateusz Guzik wrote:
>> Can you get a kernel-side flamegraph?
> Done, using sources from 2019.


First thing which jumps at me is DIAGNOSTIC being on (seen with e.g.,
_vstate_assert). Did your older kernels have it? If you just compiled
GENERIC from release branches it is presumably removed, so would be
nice to retest without it.

Then there is very minor stuff which in isolation wont make a difference
but would be nice take care of:
- pmap_page_copy uses memcpy, which performs a little bit extra work on
top of just copying - the size is known at compilation time and both
addresses are guaranteed to be aligned to 4096. Therefore it can just
copy without trying to align. iow this should use a dedicated routine.
- pmap_page_zero uses non-temporal stores, which are almost guaranteed
to only add to cache misses later on
- background page zeroing probably does not win anything and only adds
to contention on uvm_fpageqlock. I don't know if I'm reading this right,
but it seems the lock itself is only a spinlock to accomodate its use
from the idle loop. Should the feature be eliminated on amd64, the lock
can be converted to just a regular lock which would be faster single-threaded
(no interrupt crappery) and multi-threaded (no need to read off IPL from
the lock)

Here I don't see what uvm_fault_internal is contending on, it's most
likely aforementioned uvm_fpageqlock. A couple years back I wrote a
patch to batch ops using the lock, can probably be reasonably easily

That said, can you rerun without DIGANOSTIC but with lockstat?

Mateusz Guzik <mjguzik>

Home | Main Index | Thread Index | Old Index