Re: Please review: lookup changes

To: Andrew Doran <ad%netbsd.org@localhost>
Subject: Re: Please review: lookup changes
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Tue, 10 Mar 2020 10:58:18 +0000

> Date: Mon, 9 Mar 2020 19:59:03 +0000
> From: Andrew Doran <ad%netbsd.org@localhost>
> 
> As far as I recall layer vnodes persist unless recycled like any other
> vnode, so there should be nothing special happening there around contention
> due to lifecycle.  I think you could cache or transpose the cached names
> above, maybe using some kind of VOP.  I definitely think it's doable but
> beyond that, at least for me thare lots of unknowns so I have avoided it.

This reminds me that we need to find a way to release upper layer
vnodes when the lower layer is recyclable -- see the comment in
layer_inactive.  Otherwise files deleted under a null mount while open
through the null mount may persist on disk indefinitely as long as the
upper layer is still around...

(But no need for that to be in the scope of namecache improvements.)

> I also expect that if ufs_lookup() were opened up to allow a shared lock
> along the full path then pressure would be diverted to buffer locks
> previously shielded by the vnode lock.  I would be unsurprised to see that
> performing worse under contention than the single exclusive lock with that
> code as-is.  There's also the fact that modifications and uncached lookups*
> are often a tiny proportion of activity on the system in comparison to
> cached lookups, so without a specific application in mind I'm not sure
> there's a great deal to be won there.
> 
> Looking up names directly in the namecache bypasses these problem to some
> extent, because the directory vnode lock needs to be touched far less often,
> and when it is touched it's because there's real work to be done.

OK, makes sense.

Other than layerfs specifically avoiding the namecache for now, are
there any other barriers to pulling the copypasta cache_* calls out of
every VOP_LOOKUP and into namei?

> LOCKLEAF is something very old and LOCKSHARED a minor tweak to it (a
> conditional picking either LK_SHARED/LK_EXCLUSIVE on the way out of
> namei()).  I'd prefer that namei() didn't lock any vnodes to begin with but
> that's the scheme we have (see my comment re: 1993 :-).  It's also what
> FreeBSD does for this and in my experience these little differences are
> troublesome WRT code sharing so I decided to match their flag.

How much code sharing can we actually take advantage of in the vfs
layer?

> > > vfs_syscalls.c:
> > > 
> > > 	Corrected vnode lock types: LK_SHARED ok for VOP_ACCESS(), but
> > > 	LK_EXCLUSIVE still needed for VOP_GETATTR() due to itimes handling.
> > > 	That's a shame and it would be nice to use a seqlock or something
> > > 	there eventually.
> > 
> > I don't understand this itimes handling.  Why do we ever _set_ any
> > itimes in VOP_GETATTR?
> > [...]
> > Why don't we check our watch immediately when we set IN_CHANGE &c.,
> > and defer the disk write to update the inode like UFS_WAPBL_UPDATE
> > does -- and just skip UFS_ITIMES in VOP_GETATTR?
> 
> This looks to go back least as far as Unix 32V (& possibly further) in one
> form or another, from what I can see.  Whatever the original intent I think
> it might have morphed into an attempt to avoid querying the system clock at
> some point, which is about the only point I can see for it now.

That was my conclusion too.  Let's get rid of it and make VOP_GETATTR
work with a shared lock (if not a seqlock)!

> > > vfs_vnode.c:
> > > 
> > > 	Namecache related changes, and the change to not block new vnode
> > > 	references across VOP_INACTIVE() mentioned on tech-kern.  I don't
> > > 	think this blocking is needed at least for well behaved FSes like
> > > 	ffs & tmpfs.
> > 
> > hannken@ cautions me that not blocking until VOP_INACTIVE completes
> > may be dangerous.
> > 
> > The last major deadlock I found in lfs is [...]
> 
> Is this something that should be done in VOP_RECLAIM() instead?  Then any
> concerns about VOP_INACTIVE() don't apply since the inode is well and truly
> dead.

I don't think so.  I believe the reason we block during VOP_INACTIVE
is to prevent file systems from gaining new references to vnodes while
vrele is deciding whether to reclaim them -- that way, the decision to
reclaim a vnode is final and made, under the vnode lock, in one place
that doesn't have to worry about backing out.

Historically, the logic that dealt with backing out of the decision to
reclaim a vnode has been riddled with races and bugs and incoherent
workarounds, so I think it's a Good Thing that hannken@ structured the
logic so that the decision really is final.

For the lfs segment writer, examining a vnode that may be about to be
reclaimed is safe because the lfs segment writer never creates new
references -- actually it just wants to skip all of the vnodes with
ip->i_nlink == 0 because they never need to be written out to disk.
The namecache is a different story, though!

Follow-Ups:
- re: Please review: lookup changes
  - From: matthew green

References:
- Re: Please review: lookup changes
  - From: Andrew Doran

Prev by Date: Re: modload(8) v.s. alias
Next by Date: Re: NULL pointer arithmetic issues
Previous by Thread: Re: Please review: lookup changes
Next by Thread: re: Please review: lookup changes
Indexes:

Home | Main Index | Thread Index | Old Index