Re: Add a mountlist iterator

To: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Subject: Re: Add a mountlist iterator
From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
Date: Sun, 2 Apr 2017 16:48:49 +0200

> On 2. Apr 2017, at 16:34, Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
> 
>> Date: Sun, 2 Apr 2017 11:09:49 +0200
>> From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
>> 
>>> On 1. Apr 2017, at 23:03, Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>>> 
>>> Any particular reason to use a pointer-to-opaque-pointer here instead
>>> of a caller-allocated struct mountlist_iterator object?
>> 
>> Just to make it opaque to the caller.  I see no reason to make the
>> mountlist internals part of the kernel API and expose them down to
>> the file system implementations.
> 
> The caller-allocated struct mountlist_iterator could be made to
> contain just a single pointer as the current API uses.
> 
> *shrug* Not a big deal either way.
> 
>> As _mountlist_next() exists for DDB only (no locks, no allocs) I see
>> no problems here.
> 
> One might be concerned in ddb with having to chase many pointers in a
> possibly corrupt kernel memory state, so there is some small advantage
> to avoiding list traversal.

Not from the DDB prompt but from operations like "printlockedvnodes()"
or "fstrans_dump()".  If the mountlist is corrupt you're lost.

> 
>>                   Embedding the mountlist_entry in "struct mount"
>> forces us to use a "struct mount" as a marker and this struct
>> has a size of ~2.5 kBytes.
> 
> By `embed' I didn't mean that the marker has to be a whole struct
> mount.  Rather, something more like:
> 
> 	struct mountlist_entry {
> 		TAILQ_ENTRY(mountlist_entry)		mle_tqe;
> 		enum { MNTENT_MARKER, MNTENT_MOUNT }	mle_tag;
> 	};
> 
> 	struct mount {
> 		...
> 		struct mountlist_entry mnt_entry;
> 		...
> 	};
> 
> 	TAILQ_HEAD(, mountlist_entry) mount_list;
> 
> To recover the struct mount from a struct mountlist_entry that is
> tagged as a mount, simply do
> 
> 	struct mountlist_entry *mle = ...;
> 	struct mount *mp = container_of(mle, struct mount, mnt_entry);

Any reason you want "mle_tqe" and "mle_tag" public?

>>> Candidate for cacheline-aligned struct?  Why do we need a new lock
>>> mount_list_lock for this instead of using the existing mountlist_lock?
>> 
>> For the transition it is better to use a separate lock.  After the
>> transition "mountlist_lock" gets removed.
> 
> OK, fair enough.  Can you add a comment over the declaration stating
> that plan?
> 
>>> Beware: TAILQ_PREV (and TAILQ_LAST) violates strict aliasing rules, so
>>> we should probably try to phase it out rather than add new users.
>>> Maybe we should add a new kind of queue(3) that doesn't violate strict
>>> aliasing but supports *_PREV and *_LAST.
>>> 
>>> All that said, it is not clear to me why you need reverse iteration
>>> here.  None of your patches seem to use it.  Are you planning to use
>>> it for some future purpose?
>> 
>> Historical the mountlist gets traversed in mount order.  Some syscalls
>> return in this order so it should be kept.
>> 
>> Unmounting all file system on shutdown has to be done in reverse mount
>> order so here (vfs_unmount_forceone / vfs_unmountall1) we need it.
> 
> I see -- your patch just hasn't addressed those yet.  Maybe Someone^TM
> should just write a replacement for TAILQ so we can use it here now.

File 002_ml_iterator_switch around line 453 and 487.

>>> Why do we need a new refcnt dance here?  Why isn't it enough to just
>>> use vfs_busy like all the existing callers?
>> 
>> Because vfs_busy() expects the caller to already hold a reference and
>> one of the goals here is to drop the mount_list_lock as early as possible.
> 
> I suppose you want to avoid setting down a lock order for
> mount_list_lock and mp->mnt_unmounting.  I guess this doesn't matter
> much since it's limited to vfs internals.
> 
>>> Can you make another separate commit to drop the keepref parameter to
>>> vfs_unbusy, and change all the callers with keepref=true to just use
>>> vfs_destroy themselves?
>> 
>> You mean the opposite (keepref=false -> caller does vfs_destroy), yes?
>> 
>> Of the 81 usages of vfs_unbusy() only 6 have keepref=true, all others
>> would have to be changed to "vfs_unbusy(mp); vfs_destroy(mp);".
> 
> Heh.  This confusion on my part suggest two things:
> 
> 1. Perhaps in this change, we ought to add a different function for
> use with mount iteration, say mountlist_iterator_next_done, so that
> any legacy confusion about the vfs_busy/unbusy API is hidden:
> 
> 	mountlist_iterator_init(&it);
> 	while ((mp = mountlist_iterator_next(&it)) != NULL) {
> 		...
> 		mountlist_iterator_next_done(&it, mp);
> 	}
> 	mountlist_iterator_destroy(&it);

Will think about it -- looks ok so far.

> 2. Perhaps in another change, we ought to replace vfs_unbusy(mp,
> false) by vfs_unbusy(mp), and vfs_unbusy(mp, true) by
> 
> vfs_acquire(mp);
> vfs_unbusy(mp);
> 
> so that there is less confusing boolean-flagged asymmetry between
> vfs_busy/vfs_unbusy.
> 
> Thus, the busy count (determining whether a mount can be unmounted
> from the file system namespace) and the reference count (determining
> whether the struct mount kernel data structures can be freed) would be
> maintained in parallel:
> 
> 	/* file system mount point busy count (implies reference count too) */
> 	error = vfs_busy(mp);
> 	if (error)
> 		goto fail;
> 	...
> 	vfs_unbusy(mp);
> 
> 	/* data structure reference count */
> 	vfs_acquire(mp);
> 	...
> 	vfs_destroy(mp);

dito.

> (Meanwhile, vfs_release would be a better name than vfs_destroy, in a
> subsequent change.)

--
J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)

Follow-Ups:
- Re: Add a mountlist iterator (round 2)
  - From: J. Hannken-Illjes

References:
- Re: Add a mountlist iterator
  - From: Taylor R Campbell

Prev by Date: Re: locking from VOP_INACTIVE to VOP_RECLAIM
Next by Date: Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL
Previous by Thread: Re: Add a mountlist iterator
Next by Thread: Re: Add a mountlist iterator (round 2)
Indexes:

Home | Main Index | Thread Index | Old Index