tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: libc.so's vmobjlock / v_interlock



On Sun, Jan 19, 2020 at 09:50:24PM +0000, Andrew Doran wrote:

> Here's a dtrace flamegraph for a kernel build on my system, while running a
> kernel from the ad-namecache branch.
> 
> 	http://www.netbsd.org/~ad/2020/jan19.svg
> 
> The biggest single remaining concurency problem is the mutex on libc.so's
> uvm_object / vnode.  It pops up in a number of places, but most clearly seen
> above the "uvm_fault_internal" box.
> 
> I think making the vmobjlock / v_interlock a rwlock could help with making
> inroads on this problem, because in the libc.so case it's not modifying too
> much under cover of the lock (mostly pmap state).
> 
> That's not a small undertaking, but I reckon it would take about a day of
> donkey work to change every mutex_enter(..) to a rw_enter(.., RW_WRITER) and
> make a start at choosing rw_write_held() or rw_lock_held() for the
> assertions, followed by a bit of benchmarking to check for a regression and
> a jumbo commit.  From there on things could be changed incrementally.
> 
> Thoughts?

Here's a first set of changes to do just this.  I don't plan on doing more
on this for about a ~month because I will be travelling soon.

	http://www.netbsd.org/~ad/2020/vmobjlock.diff

Andrew


- This splits v_interlock apart from vmobjlock.  v_interlock stays a mutex,
  vmobjlock becomes a RW lock.  The lock order is vmobjlock -> v_interlock. 
  There are a few shared items that need to be examined closely and fixed or
  good excuses made, e.g. VI_EXECMAP but it's mostly right.

- I made anons/amaps have a rwlock too.  There is bound to be an application
  for this and I think it might be PostgreSQL, must check.

- There seems to be no performance regression from doing this.  Not having
  vnode stuff adding contention to the vmobjlock makes the numbers better.

To get to the point of doing concurrent faults for the simple cases, like
are needed for libc.so I think the following are also needed:

- PV lists in the pmap need explicit locking (unless the pmap uses a global
  lock, which is fine if it suits the machine).  I have done the PV list
  locking for x86 and it's in the patch.

  I'm suggesting a lock still need to be held on the UVM object / amap for
  pmap ops.  But that can be a read lock; for x86 I want a write lock held
  only for pmap_page_protect(VM_PROT_NONE) aka pmap_page_remove(), because
  trying have the x86 pmap manage walking the PV list while removing PTEs,
  and while everything is changing around it, and have it all nice and
  concurrent, would be really tough.  I don't think ppp(VM_PROT_NONE) is a
  hotspot anyway.

- With the above done uvm_unmap_remove() and uvm_map_protect() can take
  a reader lock on the object.  I've tried this out and it works fine
  "on my machine", except that in the libc.so case it then starts to 
  compete with uvm_fault() for the vmobjlock.

- PG_BUSY could become a reader/writer lock of sorts, I'm thinking something
  like a counter in struct vm_page, and I like the idea of covering it with
  pg->interlock.  Interested to hear any other ideas.

- Then it's a trip into the uvm_fault() -> getpages -> uvn_findpages()
  path looking for what's needed, I expect probably not a whole lot if you
  only want to handle faults for stuff that's in-core using a read lock.
  I don't know the uvm_fault() code too well.


Home | Main Index | Thread Index | Old Index