tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [PATCH] Re: Locking in src/sys/dev/fss.c



> Date: Sat, 10 Aug 2024 00:44:53 +0000
> From: Emmanuel Dreyfus <manu%netbsd.org@localhost>
> 
> Let us forget about the snapshot creatin thread, it seems there
> are many locks missing. Attached patch tries to address that.

It's possible there's some locking missing, but in order to be verify
this, we need to have the access rules (like what locks have to be
held for reads from or writes to which state?) and lock order written
down.

So I suggest you start by writing down locking notes, like these:

https://nxr.netbsd.org/xref/src/sys/dev/dkwedge/dk.c?r=1.171#73

It may be that you can't find a consistent set of locking rules.  If
so, you might say what you think it should be and add an XXX
comment identifying where the rule is violated.

Some of your changes appear to introduce bugs.  For example:

 		if (error == 0 && sc->sc_state != FSS_IDLE) {
 			error = EBUSY;
 		} else {
+			char mntname[MNAMELEN];
+
 			sc->sc_state = FSS_CREATING;
-			copyinstr(fss->fss_mount, sc->sc_mntname,
-			    sizeof(sc->sc_mntname), NULL);
+			mutex_exit(&sc->sc_slock);
+
+			copyinstr(fss->fss_mount, mntname,
+			    sizeof(mntname), NULL);
+
+			mutex_enter(&sc->sc_slock);
+			memcpy(sc->sc_mntname, mntname,
+			    sizeof(sc->sc_mntname));

It is wrong to release the lock and then reacquire it, because while
it is unlocked, sc->sc_state may have changed concurrently.

In this case, to avoid copyin under the lock, you should copyin to a
temporary buffer _before_ taking the lock in the first place, 

Some of the locking you have added is unnecessary.  For example:

-	sc->sc_mount = vp->v_mount;
-	memcpy(sc->sc_mntname, sc->sc_mount->mnt_stat.f_mntonname, MNAMELEN);
+	mount = vp->v_mount;
+	mutex_enter(&sc->sc_slock);
+	sc->sc_mount = mount;
+	memcpy(sc->sc_mntname, mount->mnt_stat.f_mntonname, MNAMELEN);
+	mutex_exit(&sc->sc_slock);

This locking is not necessary because the caller already holds the
fss(4) instance in the FSS_CREATING state, which is exclusive, so no
other threads can be writing to sc->sc_mount at the same time.  Of
course, that reasoning may not be obvious -- which is why it's
important to have locking notes written down.

(If -- as I suggest below -- we change fss_unmount_hook to null out
sc_mount, the lock is still not necessary to exclude concurrent
fss_unmount_hook because the mount can't be unmounted at this point.)

> I wonder if I should not add a vfs_busy/vfs_unbusy in such situation:
>         mount = vp->v_mount;
> (...)
>         vrele(vp);
> (...)
>         check for another vnode's ->v_mount == mount

That sounds plausible.  But only inside fss_create_snapshot; outside
it, fss(4) automatically detaches itself from each mount -- and avoids
further reference to sc_mount -- via fss_vfs_hooks fss_unmount_hook.

It might be prudent for fss_unmount_hook (or maybe fss_error) to null
out sc_mount just to confirm that no further access to it is possible,
but it looks like all use of sc_mount is conditional on FSS_ERROR not
being set.


Home | Main Index | Thread Index | Old Index