NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53124 (FFS is slow because pmap_update doesn't scale)



Synopsis: FFS is slow because pmap_update doesn't scale

State-Changed-From-To: open->analyzed
State-Changed-By: maxv%NetBSD.org@localhost
State-Changed-When: Sat, 24 Mar 2018 19:10:03 +0000
State-Changed-Why:
I understand your point about the N-1 user threads. But as far as I
can tell/remember, that's not how things work.

The kernel is mapped in each pmap. When a kernel page is shot down, an
IPI is sent to _all_ CPUs, because the kernel is always mapped
everywhere, and as a result the TLB must be flushed everywhere too.

My guess is that the 'pmap_update' you're talking about actually
touches a user pmap, and not pmap_kernel. User pmaps are used 'lazily':
when a user-lwp -> kern-lwp context switch occurs the user pmap remains
loaded on the CPU. The reason being that since the kernel is mapped in
this pmap we don't need to reload the page tables. Given this, my guess
is that your I/O program gets context-switched on several cores, and
since these cores then switch to the idle thread when your program
leaves, the pmap of your program remains loaded on them. As a result
each page modification in this pmap needs to be synchronized on each
core the program has executed on in the past; hence the IPIs, and the
slowdown.

By using N-1 user threads, you are forcing a kern-lwp -> user-lwp
transition on each core, and after that your pmap does not need to be
synchronized there anymore; so the latency disappears.

But this guess would have to be verified. You should probably try to
assign your program to a given core - and this, early, _before_ your
program starts doing heavy stuff. schedctl, or pset would be even
better. If I'm right, it should "fix" the slowdown.

(Please CC me in the answer if any)





Home | Main Index | Thread Index | Old Index