tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NCHNAMLEN vnode cache limitation removal



On 9/14/19, Mouse <mouse%rodents-montreal.org@localhost> wrote:
>> [...], because fexecve is causing rumbles about doing significantly
>> more reverse lookups.
>
> Why is everyone so concerned about finding "the" name of an inode, or
> indeed any name for an inode?  As far as I can tell there has never
> been any promise that any given inode has any names pointing to it, at
> least not unless it's got no other references to it (in which case it
> would be freed if it had no names).
>
> Given the list of smart people I've seen discussing this, I'm
> presumably just missing something, but what?
>

I think an always working name resolution is nicer to users when it comes
to diagnosing issues in userspace. In particular if someone spawns a new
program, that program performs a lot of ops in given fd, you can check
what file it is if name resolution works (or you have a facility similar
to linux's /proc/<pid>/fd). Otherwise you are left with dirty hacks.

There is a very different angle to this though and it has to do with
performance, especially in face of concurrent access. When doing SMP
you want to avoid writes to shared areas as they induce caches misses
on next access by other CPUs. If everyone accesses read-only they can
keep doing it without interfering with others. If everyone writes to
it, everyone else (minus 1) stalls waiting for their turn.

Vast majority of all lookups only care about the last path component.
Currently the kernel leapfrogs through all of them, e.g. consider the
lookup of /foo/bar/baz/qux.

Along the way it refs foo, locks foo, finds bar, references bar, locks
bar, unlocks and unrefs foo... repeat that with s/bar/baz/
and s/foo/bar/. Note that's very simplified (e.g. i skipped the
beginning of the lookup and other work like permission checking).
Crossing filesystems is another hurdle. This is only to highlight
the point below.

That's a lot of work single-threaded which can be partially avoided in
the common case, provided all the info is structured for it. But most
importantly that's a lot of atomic ops on shared areas which severely
limit your performance in case of concurrent access (especially on
multi-socket systems). It's a long way to get to a point where you can
do this in a write-free manner and FreeBSD is very early in there.

If everything is cached and you can roll forward in the common case
without refing/locking anything but the last component you win big
time. (Of course you can't "just" do it, the leapfrogging is there for
a reason but it can be worked out with tracking the state and having
safe memory reclamation.)

TL;DR it's minor quality of life improvement for users and a de facto
mandatory part of VFS if it is to scale on big systems.

-- 
Mateusz Guzik <mjguzik gmail.com>



Home | Main Index | Thread Index | Old Index