Re: kern/53124 (FFS is slow because pmap_update doesn't scale)

To: kern-bug-people%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,maxv%NetBSD.org@localhost,mlelstv%serpens.de@localhost
Subject: Re: kern/53124 (FFS is slow because pmap_update doesn't scale)
From: maxv%NetBSD.org@localhost
Date: Sat, 24 Mar 2018 19:10:03 +0000 (UTC)

Synopsis: FFS is slow because pmap_update doesn't scale

State-Changed-From-To: open->analyzed
State-Changed-By: maxv%NetBSD.org@localhost
State-Changed-When: Sat, 24 Mar 2018 19:10:03 +0000
State-Changed-Why:
I understand your point about the N-1 user threads. But as far as I
can tell/remember, that's not how things work.

The kernel is mapped in each pmap. When a kernel page is shot down, an
IPI is sent to _all_ CPUs, because the kernel is always mapped
everywhere, and as a result the TLB must be flushed everywhere too.

My guess is that the 'pmap_update' you're talking about actually
touches a user pmap, and not pmap_kernel. User pmaps are used 'lazily':
when a user-lwp -> kern-lwp context switch occurs the user pmap remains
loaded on the CPU. The reason being that since the kernel is mapped in
this pmap we don't need to reload the page tables. Given this, my guess
is that your I/O program gets context-switched on several cores, and
since these cores then switch to the idle thread when your program
leaves, the pmap of your program remains loaded on them. As a result
each page modification in this pmap needs to be synchronized on each
core the program has executed on in the past; hence the IPIs, and the
slowdown.

By using N-1 user threads, you are forcing a kern-lwp -> user-lwp
transition on each core, and after that your pmap does not need to be
synchronized there anymore; so the latency disappears.

But this guess would have to be verified. You should probably try to
assign your program to a given core - and this, early, _before_ your
program starts doing heavy stuff. schedctl, or pset would be even
better. If I'm right, it should "fix" the slowdown.

(Please CC me in the answer if any)

Follow-Ups:
- Re: kern/53124 (FFS is slow because pmap_update doesn't scale)
  - From: Michael van Elst

Prev by Date: Re: install/43361 (no IPv6 network installation on some boot media)
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: lib/53125 (sys/timevar.h is absent in installed systems.)
Next by Thread: Re: kern/53124 (FFS is slow because pmap_update doesn't scale)
Indexes:

Home | Main Index | Thread Index | Old Index