tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

PUFFS double reclaim



Hello

I noticed a PUFFS related bug after upgrading to NetBSD 8.0: there are
situations where the kernel sends multiple reclaims on a node. Since the first
reclaim is supposed to have freed the node, the second will crash the
filesystem.

Here is an exemple:

reqid: 899, opclass 2, optype: PUFFS_VN_LOOKUP, cookie: 0xbb451e40,
    aux: 0xbb42e02c, auxlen: 1352, pid: 28043, lwpid: 1
    puffs_cn: "10", len 2 op LOOKUP (flags 0xc014)
    since previous call: 0.000189
    new 0xbb476040, type 0x2, size 0x4096, dev 0xffffffff
RV reqid: 899, result: 0 
reqid: 900, opclass 2, optype: PUFFS_VN_INACTIVE, cookie: 0xbb476040,
    aux: 0xbb42f02c, auxlen: 44, pid: 28043, lwpid: 1
    since previous call: 0.004307
RV reqid: 900, result: 0 
(...)
reqid: 1069, opclass 2, optype: PUFFS_VN_LOOKUP, cookie: 0xbb451e40,
    aux: 0xbb42e02c, auxlen: 1352, pid: 28043, lwpid: 1
    puffs_cn: "10", len 2 op LOOKUP (flags 0xc054)
    since previous call: 0.000069
    new 0xbb476040, type 0x2, size 0x4096, dev 0xffffffff
RV reqid: 1069, result: 0 
reqid: 1070, opclass 2, optype: PUFFS_VN_ACCESS, cookie: 0xbb476040,
    aux: 0xbb42f02c, auxlen: 132, pid: 28043, lwpid: 1
    since previous call: 0.010458
RV reqid: 1070, result: 0 
reqid: 1071, opclass 2, optype: PUFFS_VN_OPEN, cookie: 0xbb476040,
    aux: 0xbb42e02c, auxlen: 136, pid: 28043, lwpid: 1
    mode: 0x600005
    since previous call: 0.000314
RV reqid: 1071, result: 0 
reqid: 1072, opclass 2, optype: PUFFS_VN_ACCESS, cookie: 0xbb476040,
    aux: 0xbb42f02c, auxlen: 132, pid: 28043, lwpid: 1
    since previous call: 0.002247
RV reqid: 1072, result: 0 
reqid: 1073, opclass 2, optype: PUFFS_VN_READDIR, cookie: 0xbb476040,
    aux: 0xbb49702c, auxlen: 4248, pid: 28043, lwpid: 1
    read offset: 0
    since previous call: 0.000196
    resid after op: 3800, eofflag 1
RV reqid: 1073, result: 0 
reqid: 1074, opclass 2, optype: PUFFS_VN_LOOKUP, cookie: 0xbb476040,
    aux: 0xbb42e02c, auxlen: 1352, pid: 28043, lwpid: 1
    puffs_cn: "pdata", len 5 op LOOKUP (flags 0xc014)
    since previous call: 0.039964
    new 0xbb41e480, type 0x2, size 0x4096, dev 0xffffffff
RV reqid: 1074, result: 0 
(...)
reqid: 1076, opclass 2, optype: PUFFS_VN_READDIR, cookie: 0xbb476040,
    aux: 0xbb49702c, auxlen: 4248, pid: 28043, lwpid: 1
    read offset: 296
    since previous call: 0.000199
    resid after op: 4096, eofflag 1
RV reqid: 1076, result: 0 
reqid: 0, opclass 2 (FAF), optype: PUFFS_VN_CLOSE, cookie: 0xbb476040,
    aux: 0xbb42e02c, auxlen: 132, pid: 28043, lwpid: 1
    since previous call: 0.000623
(...)
reqid: 1137, opclass 2, optype: PUFFS_VN_ACCESS, cookie: 0xbb476040,
    aux: 0xbb42e02c, auxlen: 132, pid: 28043, lwpid: 1
    since previous call: 0.000022
RV reqid: 1137, result: 0 
reqid: 1138, opclass 2, optype: PUFFS_VN_OPEN, cookie: 0xbb476040,
    aux: 0xbb42f02c, auxlen: 136, pid: 28043, lwpid: 1
    mode: 0x400001
    since previous call: 0.000177
RV reqid: 1138, result: 0 
reqid: 1139, opclass 2, optype: PUFFS_VN_ACCESS, cookie: 0xbb476040,
    aux: 0xbb42e02c, auxlen: 132, pid: 28043, lwpid: 1
    since previous call: 0.000166
RV reqid: 1139, result: 0 
reqid: 0, opclass 2 (FAF), optype: PUFFS_VN_CLOSE, cookie: 0xbb476040,
    aux: 0xbb42f02c, auxlen: 132, pid: 28043, lwpid: 1
    since previous call: 0.000271
(...)
reqid: 1695, opclass 2, optype: PUFFS_VN_INACTIVE, cookie: 0xbb476040,
    aux: 0xbb43002c, auxlen: 44, pid: 0, lwpid: 50
    since previous call: 0.000778
RV reqid: 1695, result: 0 
(...)
reqid: 1829, opclass 2, optype: PUFFS_VN_INACTIVE, cookie: 0xbb476040,
    aux: 0xbb42f02c, auxlen: 44, pid: 0, lwpid: 50
    since previous call: 0.000564
RV reqid: 1829, result: 0 
(...)
reqid: 0, opclass 2 (FAF), optype: PUFFS_VN_FSYNC, cookie: 0xbb476040,
    aux: 0xbb43002c, auxlen: 148, pid: 0, lwpid: 9
    since previous call: 0.000373
reqid: 4055, opclass 2, optype: PUFFS_VN_INACTIVE, cookie: 0xbb476040,
    aux: 0xbb43002c, auxlen: 44, pid: 0, lwpid: 9
    since previous call: 0.000021
RV reqid: 4055, result: 0 
reqid: 0, opclass 2 (FAF), optype: PUFFS_VN_RECLAIM, cookie: 0xbb476040,
    aux: 0xbb4da02c, auxlen: 48, pid: 0, lwpid: 9
    since previous call: 0.000753

 ----> here puffs_node_reclaim2 is called with nlookup = 2, we did 2 lookups, we
should free it.
 
(...)
reqid: 0, opclass 2 (FAF), optype: PUFFS_VN_FSYNC, cookie: 0xbb476040,
    aux: 0xbb42f02c, auxlen: 148, pid: 0, lwpid: 9
    since previous call: 0.263816
reqid: 0, opclass 2 (FAF), optype: PUFFS_VN_RECLAIM, cookie: 0xbb476040,
    aux: 0xbb42f02c, auxlen: 48, pid: 0, lwpid: 9
    since previous call: 0.000025

Use of freed node opc = 0xbb476040Abort (core dumped)

the second reclaim is called with nlookup = 0, which would let the user process
filter it, but unfortunately we will often crash on the fsync just before, where
we have no opportunity to detect the bug.

lwp 0.9 is vdrain:
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
0        9     3      0           280           c2d57aa0         vdrain puffsrpl

Here is its backtrace on the fsync after reclaim:
db{0}> bt/a c2d57aa0
trace: pid 0 lid 9 at 0xda6c9db0
sleepq_block(0,1,c04ecf37,c0589e70,c2d4bd40,c3379000,c900fdd4,c058a8c0,da6c9e28,
c0286522) at netbsd:sleepq_block+0xe8
cv_wait_sig(c900fdf8,c900fe04,da6c9dfc,da6c9e0c,c900e3c0,c900fe04,c2d57aa0,ffffb
ef8,ffffffff,ffffffff) at netbsd:cv_wait_sig+0xa7
puffs_msg_wait(c3379000,c900fdd4,c37e1dd4,c895b7ec,c3379000,da6c9e84,c028dae5,c3
379000,c900fdd4,c895b7ec) at netbsd:puffs_msg_wait+0x132
puffs_msg_wait2(c3379000,c900fdd4,c895b7ec,0,0,da6c9e84,c0384714,c058fd80,c895b8
70,c900e3c0) at netbsd:puffs_msg_wait2+0x27
puffs_vnop_inactive(da6c9e98,0,1000000,c04d2bb4,c37e1dd4,da6c9edf,c37e1dd4,da6c9
f30,c041522d,c37e1dd4) at netbsd:puffs_vnop_inactive+0x1a5
VOP_INACTIVE(c37e1dd4,da6c9edf,ffffffff,c2d57aa0,0,0,1,4,c2d57aa0,da6c9ee0) at n
etbsd:VOP_INACTIVE+0x4a
vcache_reclaim(c37e1e00,20012,c37e1dd4,c8ed8000,da6c9f8c,c04168d5,c37e1dd4,c0591
b80,c3137f18,0) at netbsd:vcache_reclaim+0xdd
vrecycle(c37e1dd4,c0591b80,c3137f18,0,c01149c1,da6c9f78,c3137f18,170a3,c0591c00,
c0591c40) at netbsd:vrecycle+0xc3
vdrain_thread(c2d57aa0,96e000,c0595200,0,c0100075,0,0,0,0,0) at netbsd:vdrain_th
read+0x385

vcache_reclaim calls VOP_INACTIVE and VOP_RECLAIM. A possible workaround is to
guard puffs_vnode_inactive and puffs_vnode_reclaim with a nlookup santity check:

        if (VPTOPP(vp)->pn_nlookup == 0);
                return 0;

But this may be hiding a more serious problem. Anyone has some insights about
this double reclaim?


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index