tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Please review: lookup changes



>> locks and not hashing itself. Note that at the time vnode interlock and
>> vm object lock were the same thing and in this workload the lock is
>> under a lot of contention. Should the lock owner get preempted, anyone
>> doing a lookup to the affected vnode (e.g., libc) will be holding
>> the relevant per-cpu lock and will block on a turnstile. Whoever ends
>> up running on the affected cpu is likely to do a lookup on their own,
>> but the relevant per-cpu lock is taken and go off cpu. The same thing
>> happening on more than one cpu at a time could easily reslut in a
>> cascading failure, which I strongly suspect is precisely what happened.
>>
>> That is, the win does not stem from rb trees but finer-grained locking
>> which does not block other threads which look up something else on
>> the same cpu.
>
> Not on NetBSD.  Kernel preemption is possible and allowed (mainly for real
> time applications), but happens infrequently during normal operation.
> There
> are a number of pieces of code that take advantage of that fact and are
> "optimistically per-CPU", and they work very well as preemption is rarely
> observed.  Further the blocking case on v_interlock in cache_lookup() is
> rare.  That's no to say it doesn't happen, it does, but I don't think it
> enough to explain the performance differences.
>

I noted suspected preemption was occurring because of contention on the vm
side. It should only take one to start the cascade.

>> As mentioned earlier I think rb trees instead of a hash are pessimal
>> here.
>>
>> First, a little step back. The lookup starts with securing vnodes from
>> cwdinfo. This represents a de facto global serialisation point (times two
>> since the work has to be reverted later). In FreeBSD I implemented an
>> equivalent with copy-on-write semantics. Easy take on it is that I take
>> an rwlock-equivalent and then grab a reference on the found struct.
>> This provides me with an implicit reference on root and current working
>> directory vnodes. If the struct is unshared on fork, aforementioned
>> serialisation point becomes localized to the process.
>
> Interesting.  Sounds somewhat like both NetBSD and FreeBSD do for process
> credentials.
>

I was thinking about doing precisely that but I found it iffy to have
permanently stored references per-thread. With the proposal they get
"gained" around the actual lookup, otherwise this is very similar to
what mountcheckdirs is dealing with right now.

>> In my tests even with lookups which share most path components, the
>> last one tends to be different. Using a hash means this typically
>> results in grabbing different locks for that case and consequently
>> fewer cache-line ping pongs.
>
> My experience has been different.  What I've observed is that shared hash
> tables usually generate huge cache pressure unless they're small and rarely
> updated.  If the hash were small, matching the access pattern (e.g.
> per-dir) then I think it would have the opportunity to make maximum use of
> the cache.  That could be a great win and certainly better than rbtree.
>

Well in my tests this is all heavily dominated by SMP-effects, which I
expect to be exacerbated by just one lock.

Side note is that I had a look at your vput. The pre-read + VOP_UNLOCK +
actual loop to drop the ref definitely slow things down if only a little
bit as this can force a shared cacheline transition from under someone
cmpxching.

That said, can you generate a flamegraph from a fully patched kernel?
Curious where the time is spent now, my bet is spinning on vnode locks.

-- 
Mateusz Guzik <mjguzik gmail.com>



Home | Main Index | Thread Index | Old Index