Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded

To: Andrew Doran <ad%netbsd.org@localhost>
Subject: Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded
From: Mateusz Guzik <mjguzik%gmail.com@localhost>
Date: Tue, 21 Jan 2020 23:12:06 +0100

On 1/21/20, Andrew Doran <ad%netbsd.org@localhost> wrote:
> On Thu, Jan 16, 2020 at 04:51:44AM +0100, Mateusz Guzik wrote:
>>
>> I'm assuming the goal for the foreseeable future is to achieve path
>> lookup
>> with shared vnode locks.
>
> Good guess.  There is a prototype of LK_SHARED lookup on the ad-namecache
> branch, along with lookup using namecache only that takes as few vnode
> locks
> and refcounts as possible.  In the easiest case only the root vnode and the
> leaf vnode are touched (although whether the root vnode ever actually needs
> a reference is another question).  There are many "non-easy" cases though.
>

As in you can get away without ever accessing it if there are no long
enough .. chains and absolute paths (along with symlinks)?

My solution to the problem (not implemented yet) is to introduce a
copy-on-write structre which holds the ref to the rootvnode. Then lwps
can just use it while at worst refing/unrefing that strut but never the
vnode. This also covers the current working directory.

I have seen the branch. I think the use of rb instead of a hash is
pessimal.

My speculation where the wins over the current code are coming from boils
down to per-cpu locks being eliminated. Specifically, since v_interlock
is a lock from bufobj and is heavily used in vm, likelyhood of getting
preempted while holding it increases. Say CPU0 takes its own pcpu lock and
proceeds to take v_interlock. Now the thread taken off to wait. If the new
thread running on CPU0 performs *any* path lookup it will also block,
instead of only blocking if it was looking up the affected vnode. iow
there is huge potential to stall *all* lookups on given CPU.

This does not happen with rb trees and would not happen with the hash
table if there was bucket locking instead of per-cpu.

I would stick to hash tables since they are easier to scale (both with
and without locks).

For instance if 2 threads look up "/bin" and "/usr" and there is no
hash collision, they lock *separate* buckets and suffer less in terms
of cacheline bouncing. In comparison with rb trees they will take the
same lock. Of course this similarly does not scale if they are looking
up the same path.

-- 
Mateusz Guzik <mjguzik gmail.com>

Follow-Ups:
- Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded
  - From: Thor Lancelot Simon

References:
- Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded
  - From: Andrew Doran
- Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded
  - From: Mateusz Guzik
- Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded
  - From: Andrew Doran

Prev by Date: Re: libc.so's vmobjlock / v_interlock
Next by Date: Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded
Previous by Thread: Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded
Next by Thread: Re: Blocking vcache_tryvget() across VOP_INACTIVE() - unneeded
Indexes:

Home | Main Index | Thread Index | Old Index