Re: Add a mountlist iterator

To: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Subject: Re: Add a mountlist iterator
From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
Date: Sun, 2 Apr 2017 11:09:49 +0200

> On 1. Apr 2017, at 23:03, Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
> 
>> Date: Thu, 30 Mar 2017 11:21:41 +0200
>> From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
>> 
>> Plan is to untangle the mountlist processing from vfs_busy() / vfs_unbusy()
>> and add an iterator for the mountlist:
> 
> Generally seems like a good idea to me!
> 
>> - void mountlist_iterator_init(mount_iterator_t **)
>>  to initialize an iterator in mounted order.
> 
> Any particular reason to use a pointer-to-opaque-pointer here instead
> of a caller-allocated struct mountlist_iterator object?

Just to make it opaque to the caller.  I see no reason to make the
mountlist internals part of the kernel API and expose them down to
the file system implementations.

> struct mountlist_iterator it;
> 
> mountlist_iterator_init(&it);
> while ((mp = mountlist_iterator_next(&it, ...)) != NULL)
> 	...
> mountlist_iterator_destroy(&it);
> 
> Then there's no need to kmem_alloc any intermediate data structures.
> 
> Embedding the struct mountlist_entry in struct mount would obviate the
> need for _mountlist_next to iterate over the entire list from the
> start, too.

As _mountlist_next() exists for DDB only (no locks, no allocs) I see
no problems here.  Embedding the mountlist_entry in "struct mount"
forces us to use a "struct mount" as a marker and this struct
has a size of ~2.5 kBytes.

> Some in-line comments on the diff.  General comment: can you make your
> diffs with the `-p' option so it's easier to follow which function
> each diff hunk edits?
> 
>   diff -r dab39dcbcb29 -r ff3f89481f2e sys/kern/vfs_mount.c
>   --- a/sys/kern/vfs_mount.c      Thu Mar 30 11:17:56 2017 +0200
>   +++ b/sys/kern/vfs_mount.c      Thu Mar 30 11:17:56 2017 +0200
>   @@ -119,7 +136,9 @@
>   ...
>   +static TAILQ_HEAD(mountlist, mountlist_entry) mount_list;
>   +static kmutex_t mount_list_lock;
> 
> Candidate for cacheline-aligned struct?  Why do we need a new lock
> mount_list_lock for this instead of using the existing mountlist_lock?

For the transition it is better to use a separate lock.  After the
transition "mountlist_lock" gets removed.

>   @@ -1439,10 +1454,168 @@
>   ...
>   +               if (marker->me_type == ME_MARKER)
>   +                       me = TAILQ_NEXT(marker, me_list);
>   +               else
>   +                       me = TAILQ_PREV(marker, mountlist, me_list);
> 
> Beware: TAILQ_PREV (and TAILQ_LAST) violates strict aliasing rules, so
> we should probably try to phase it out rather than add new users.
> Maybe we should add a new kind of queue(3) that doesn't violate strict
> aliasing but supports *_PREV and *_LAST.
> 
> All that said, it is not clear to me why you need reverse iteration
> here.  None of your patches seem to use it.  Are you planning to use
> it for some future purpose?

Historical the mountlist gets traversed in mount order.  Some syscalls
return in this order so it should be kept.

Unmounting all file system on shutdown has to be done in reverse mount
order so here (vfs_unmount_forceone / vfs_unmountall1) we need it.

>   +               /* Take an initial reference for vfs_busy() below. */
>   +               mp = me->me_mount;
>   +               KASSERT(mp != NULL);
>   +               atomic_inc_uint(&mp->mnt_refcnt);
>   +               mutex_exit(&mount_list_lock);
>   +
>   +               /* Try to mark this mount busy and return on success. */
>   +               if (vfs_busy(mp, NULL) == 0) {
>   +                       vfs_destroy(mp);
>   +                       return mp;
>   +               }
>   +               vfs_destroy(mp);
>   +               mutex_enter(&mount_list_lock);
> 
> Why do we need a new refcnt dance here?  Why isn't it enough to just
> use vfs_busy like all the existing callers?

Because vfs_busy() expects the caller to already hold a reference and
one of the goals here is to drop the mount_list_lock as early as possible.

> if (vfs_busy(mp, NULL) == 0) {
> 	mutex_exit(&mount_list_lock);
> 	return mp;
> }
> 
> Seems to me we ought to be removing code that names mnt->mnt_refcnt
> specifically, rather than adding more code.
> 
> (Then we could also move the `mutex_exit; return;' out of the for loop
> to make it look a little more symmetric in mountlist_iterator_next.)
> 
>   diff -r ff3f89481f2e -r 28fea04e7b15 sys/coda/coda_vfsops.c
>   --- a/sys/coda/coda_vfsops.c    Thu Mar 30 11:17:56 2017 +0200
>   +++ b/sys/coda/coda_vfsops.c    Thu Mar 30 11:17:57 2017 +0200
>   @@ -624,23 +624,20 @@
>   -    mutex_enter(&mountlist_lock);
>   -    TAILQ_FOREACH(mp, &mountlist, mnt_list) {
>   -       if ((!strcmp(mp->mnt_op->vfs_name, MOUNT_UFS)) &&
>   -           ((VFSTOUFS(mp))->um_dev == (dev_t) dev)) {
>   -           /* mount corresponds to UFS and the device matches one we want */
>   -           break;
>   -       }
>   +    if (spec_node_lookup_by_dev(VBLK, dev, &vp) == 0) {
>   +       mp = spec_node_getmountedfs(vp);
>   +       vrele(vp);
>   +    } else {
>   +       mp = NULL;
>        }
>   -    mutex_exit(&mountlist_lock);
> 
> Commit this change separately first?

Sure.


>   ml_iterator_vfs_busy
> 
>   Drop the mountlist processing from vfs_busy() and vfs_unbusy().
> 
>   diff -r 28fea04e7b15 -r bb476fea44cd sys/kern/vfs_mount.c
>   --- a/sys/kern/vfs_mount.c      Thu Mar 30 11:17:57 2017 +0200
>   +++ b/sys/kern/vfs_mount.c      Thu Mar 30 11:17:57 2017 +0200
>   @@ -307,26 +307,26 @@
>     * => Will fail if the file system is being unmounted, or is unmounted.
>     */
>    int
>   -vfs_busy(struct mount *mp, struct mount **nextp)
>   +vfs_busy(struct mount *mp, int flags)
>    {
> 
>           KASSERT(mp->mnt_refcnt > 0);
>   +       KASSERT((flags & ~MNT_NOWAIT) == 0);
> 
> You seem to have put an unrelated change into this commit -- a change
> to add a flags argument to vfs_busy.  Separate commit, please?

Ok.

> Not clear without further explanation what the purpose of this change
> is.  I would also be inclined to say vfs_trybusy(mp) rather than
> vfs_busy(mp, MNT_NOWAIT), particularly since MNT_NOWAIT is derived
> from a different context.

This change will help the syncer to skip unmounting file systems.
Should be enough to remove the syncer dance from unmount.

But ok to use vfs_trybusy() here.

> Can you make another separate commit to drop the keepref parameter to
> vfs_unbusy, and change all the callers with keepref=true to just use
> vfs_destroy themselves?

You mean the opposite (keepref=false -> caller does vfs_destroy), yes?

Of the 81 usages of vfs_unbusy() only 6 have keepref=true, all others
would have to be changed to "vfs_unbusy(mp); vfs_destroy(mp);".

--
J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)

Follow-Ups:
- Re: Add a mountlist iterator
  - From: Taylor R Campbell

References:
- Re: Add a mountlist iterator
  - From: Taylor R Campbell

Prev by Date: Re: locking from VOP_INACTIVE to VOP_RECLAIM
Next by Date: Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL
Previous by Thread: Re: Add a mountlist iterator
Next by Thread: Re: Add a mountlist iterator
Indexes:

Home | Main Index | Thread Index | Old Index