tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: vnode lock and v_numoutput



   Date: Sat, 24 Jan 2015 14:34:24 +0100
   From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>

   I guess what happens is:
   vndthread is blocked trying to get the vn_lock because the  thread
   doing the sys_fsync() is holding it.
   The thread in sys_fsync() waits for v_numoutput to go down to 0,
   but this won't happen because vndthread has increased v_numoutput but
   has not queued the I/O yet.

Whatever v_numoutput is accounting needs to be done alongside the
accounting if anything is going to cv_wait on it.  It is clearly wrong
API design (no surprise) that there are many places in the tree that
read and modify v_numoutput directly.  For example, why does
genfs_do_io increment v_numoutput by 2??

Step #1 is to identify what v_numoutput is supposed to be accounting,
document it, and move all uses of it to a single abstraction in
vfs_vnode.c...

   Should a thread hold the vnode lock (vn_lock()) before increasing v_numoutput ?

No, I don't think so.  It seems to be the number of pending bufs that
have been submitted to the disk device to satisfy writes to the vnode,
so there's no invariant related to the per-file-system state.

It looks to me as though it is wrong to

(a) increase v_numoutput,
(b) wait for the vnode lock, and then
(c) submit the buf you just counted in v_numoutput,

because we can't lock the vnode until the buf is completed if fsync
began waiting between (a) and (b).  (`Submit' in (c) means sending to
a disk, e.g. with VOP_STRATEGY, or, in this case, just calling
nestiobuf_done.)

On the other hand, it is also wrong to

(a) submit a buf, and then
(b) increase v_numoutput,

because there may be outstanding bufs which an fsync initiated between
(a) and (b) won't notice.

But is that wrong for the parent buf of a nestiobuf transaction?
These don't actually correspond to data themselves, and it seems that
the purpose of the v_numoutput accounting is to notify fsync when all
data submitted to disk have been written.  So based on this analysis I
suspect it would be safe to move the v_numoutput++ to just before the
call to nestiobuf_done in handle_with_strategy.


Aside: It looks like the v_numoutput += 2 in genfs_do_io represents
the parent buf of the nestiobuf transactions, and reusing that buf for
another iodone callback which calls vwakeup.


Home | Main Index | Thread Index | Old Index