tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: vnode lock and v_numoutput
On Sat, Jan 24, 2015 at 05:32:41PM +0000, Taylor R Campbell wrote:
> Date: Sat, 24 Jan 2015 14:34:24 +0100
> From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
>
> I guess what happens is:
> vndthread is blocked trying to get the vn_lock because the thread
> doing the sys_fsync() is holding it.
> The thread in sys_fsync() waits for v_numoutput to go down to 0,
> but this won't happen because vndthread has increased v_numoutput but
> has not queued the I/O yet.
>
> Whatever v_numoutput is accounting needs to be done alongside the
> accounting if anything is going to cv_wait on it. It is clearly wrong
> API design (no surprise) that there are many places in the tree that
> read and modify v_numoutput directly. For example, why does
> genfs_do_io increment v_numoutput by 2??
>
> Step #1 is to identify what v_numoutput is supposed to be accounting,
> document it, and move all uses of it to a single abstraction in
> vfs_vnode.c...
Yes, I agree it's not clear what this is supposed to do. This could probably
be improved (but not trivial)
>
> Should a thread hold the vnode lock (vn_lock()) before increasing v_numoutput ?
>
> No, I don't think so. It seems to be the number of pending bufs that
> have been submitted to the disk device to satisfy writes to the vnode,
> so there's no invariant related to the per-file-system state.
>
> It looks to me as though it is wrong to
>
> (a) increase v_numoutput,
> (b) wait for the vnode lock, and then
> (c) submit the buf you just counted in v_numoutput,
>
> because we can't lock the vnode until the buf is completed if fsync
> began waiting between (a) and (b). (`Submit' in (c) means sending to
> a disk, e.g. with VOP_STRATEGY, or, in this case, just calling
> nestiobuf_done.)
>
> On the other hand, it is also wrong to
>
> (a) submit a buf, and then
> (b) increase v_numoutput,
>
> because there may be outstanding bufs which an fsync initiated between
> (a) and (b) won't notice.
>
> But is that wrong for the parent buf of a nestiobuf transaction?
> These don't actually correspond to data themselves, and it seems that
> the purpose of the v_numoutput accounting is to notify fsync when all
> data submitted to disk have been written. So based on this analysis I
> suspect it would be safe to move the v_numoutput++ to just before the
> call to nestiobuf_done in handle_with_strategy.
No, it's not: the call to nestiobuf_done() may be a noop (it probably is
in most case) when skipped is 0. The real biodone() may happen before,
we need to v_numoutput++ before that.
The attached patch increases v_numoutput for the parent buf just before
queing the last nested buf. I think this is the correct fix (or should
we call this a workaround?). With this I couldn't reproduce the problem
on my test machine.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Index: vnd.c
===================================================================
RCS file: /cvsroot/src/sys/dev/vnd.c,v
retrieving revision 1.219.8.2
diff -u -p -u -r1.219.8.2 vnd.c
--- vnd.c 5 Jul 2012 18:12:46 -0000 1.219.8.2
+++ vnd.c 24 Jan 2015 21:41:11 -0000
@@ -785,12 +785,6 @@ handle_with_strategy(struct vnd_softc *v
flags = obp->b_flags;
- if (!(flags & B_READ)) {
- vp = bp->b_vp;
- mutex_enter(vp->v_interlock);
- vp->v_numoutput++;
- mutex_exit(vp->v_interlock);
- }
/* convert to a byte offset within the file. */
bn = obp->b_rawblkno * vnd->sc_dkdev.dk_label->d_secsize;
@@ -862,6 +856,21 @@ handle_with_strategy(struct vnd_softc *v
nbp->vb_buf.b_flags, nbp->vb_buf.b_data,
nbp->vb_buf.b_bcount);
#endif
+ if (!(flags & B_READ) && resid == sz) {
+ struct vnode *w_vp;
+ /*
+ * this is the last nested buf, account for
+ * the parent buf write too.
+ * This has to be done last, so that
+ * fsync won't wait for this write which
+ * has to chance to complete before all nested bufs
+ * have been queued.
+ */
+ w_vp = bp->b_vp;
+ mutex_enter(w_vp->v_interlock);
+ w_vp->v_numoutput++;
+ mutex_exit(w_vp->v_interlock);
+ }
VOP_STRATEGY(vp, nbp);
bn += sz;
}
Home |
Main Index |
Thread Index |
Old Index