fstrans_start(vp->v_mount) and vgone

To: tech-kern%NetBSD.org@localhost
Subject: fstrans_start(vp->v_mount) and vgone
From: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Date: Tue, 11 Apr 2023 10:13:32 +0000

Last year hannken@ changed fstrans(9) to use a hash table mapping
struct mount pointers to fstrans info, in order to fix a crash found
by syzkaller arising from a race between fstrans_start(vp->v_mount)
and vgone(vp):

https://syzkaller.appspot.com/bug?extid=54dc9ac0804a97b59bc6
https://mail-index.netbsd.org/source-changes/2022/07/08/msg139622.html

However, while this may avoid a _crash_ from use-after-free of
vp->v_mount->mnt_..., I'm not sure this is enough to prevent a _race_.

Suppose we have the following sequence of events:

lwp0                    lwp1
----                    ----
mp = vp->v_mount;
                        vgone(vp);
                        -> vp->v_mount = deadfs
                        unmount(mp)
                        -> free(mp)
                        mount a new file system
                        -> malloc() returns the same mp
fstrans_start(mp);

Now lwp0 has started a transaction on some totally unrelated mount
point -- and may take confused actions under the misapprehension that
vp is a vnode that unrelated mount point.

I think in order to fix this properly, we really need an operation
that atomically (a) loads vp->v_mount, and (b) starts an fstrans on
it.

It's not necessary to use a hash table indirection for this -- we can
just use pserialize and ensure that struct mount objects aren't
actually freed until all pserialize read sections are completed.

Something like this:

struct mount *
fstrans_start_vnode(struct vnode *vp)
{
	struct mount *mp;
	struct fstrans_lwp_info *fli;
	int s;

	s = pserialize_read_enter();
	mp = atomic_load_relaxed(&vp->v_mount);
	fli = fstrans_get_lwp_info(mp, /*alloc*/false);
	if (fli == NULL) {
		pserialize_read_exit(s);
		/* XXX preallocate fli and retry with alloc=true */
	}
        if (fli->fli_trans_cnt > 0) {
		fli->fli_trans_cnt++;
		return mp;
	}
	fmi = fli->fli_mountinfo;
	if (__predict_true(grant_lock(fmi, ...))) {
		...
	}
	...
}

And fstrans_mount_dtor can do pserialize_perform (or just xc_barrier)
to wait for all these pserialize read sections to complete before
freeing mp.

Then, any code that currently does

	mp = vp->v_mount;
	fstrans_start(mp);

or equivalent, can be rewritten to do:

	mp = fstrans_start_vnode(vp);

Perhaps we can even forbid direct access to vp->v_mount to catch any
mistakes of this class -- and if there's a lot of access to it that
doesn't match this pattern, we could add a vnode_mount(vp) accessor to
assert that the caller holds an fstrans.

Thoughts?

Follow-Ups:
- Re: fstrans_start(vp->v_mount) and vgone
  - From: J. Hannken-Illjes

Prev by Date: Re: kernel goes dark on boot
Next by Date: Re: fstrans_start(vp->v_mount) and vgone
Previous by Thread: Radeon Driver Ioctl Permissions
Next by Thread: Re: fstrans_start(vp->v_mount) and vgone
Indexes:

Home | Main Index | Thread Index | Old Index