NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/42455: tstile hang with nfs
The following reply was made to PR kern/42455; it has been noted by GNATS.
From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Mon, 8 Nov 2010 03:16:42 +0000
(three messages not sent to gnats)
------
From: Christoph Egger <Christoph_Egger%gmx.de@localhost>
To: Christoph Egger <Christoph_Egger%gmx.de@localhost>
Cc: netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
kern-bug-people%netbsd.org@localhost, yamt%mwd.biglobe.ne.jp@localhost,
rmind%netbsd.org@localhost, enami%netsbd.org@localhost
Subject: Re: kern/42455: tstile hang with nfs
Date: Fri, 05 Nov 2010 10:13:39 +0100
yamt: Your guess is right: There is an mbuf leak through
the use of pool_cache(9) on 'mb_cache'.
In nfsm_disct() at line 963 m_get() is called.
m2 = m_get(M_WAIT, MT_DATA); <-- line 963
m_get() calls pool_cache_get().
There is a race where pool_cache_get() returns an mbuf
for the receiving mbuf chain that is still used
in the sending mbuf chain.
The sending mbuf chain is this (and nwc_mbufcount is 2):
db> show mbuf 0xffffa000013eea00
MBUF 0xffffa000013eea00
data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
owner=0xffffffff80bdd558, next=0xffffa000013c4c00, nextpkt=0x0
leadingspace=0, trailingspace=400, readonly=0
MBUF 0xffffa000013c4c00
data=0xffffa000221e6000, len=8192, type=1, flags=0x0x4000001<EXT,EXT_ROMAP>
owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
leadingspace=0, trailingspace=0, readonly=1
ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
ext_free=0xffffffff80
4e6ca7, ext_arg=0xffffa00026119a70
m_get() initializes the returned mbuf with m_next set to NULL.
So when m_get() does m->m_next = NULL; the sending mbuf
chain is this:
db> show mbuf 0xffffa000013eea00
MBUF 0xffffa000013eea00
data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
owner=0xffffffff80bdd558, next=0x0, nextpkt=0x0
leadingspace=0, trailingspace=400, readonly=0
db> show mbuf 0xffffa000013c4c00
MBUF 0xffffa000013c4c00
data=0xffffa000221e6000, len=8192, type=1, flags=0x0x4000001<EXT,EXT_ROMAP>
owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
leadingspace=0, trailingspace=0, readonly=1
ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
ext_free=0xffffffff80
4e6ca7, ext_arg=0xffffa00026119a70
The second mbuf is lost, ext_free hook is never called
to decrease the nwc_mbufcount.
Christoph
From: Christoph Egger <Christoph_Egger%gmx.de@localhost>
To: netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
kern-bug-people%netbsd.org@localhost
Cc: enami%netbsd.org@localhost, rmind%netbsd.org@localhost,
yamt%mwd.biglobe.ne.jp@localhost,
kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/42455: tstile hang with nfs
Date: Fri, 05 Nov 2010 14:27:19 +0100
>
> yamt: Your guess is right: There is an mbuf leak through
> the use of pool_cache(9) on 'mb_cache'.
>
> In nfsm_disct() at line 963 m_get() is called.
>
> m2 = m_get(M_WAIT, MT_DATA); <-- line 963
>
> m_get() calls pool_cache_get().
> There is a race where pool_cache_get() returns an mbuf
> for the receiving mbuf chain that is still used
> in the sending mbuf chain.
>
> The sending mbuf chain is this (and nwc_mbufcount is 2):
>
> db> show mbuf 0xffffa000013eea00
> MBUF 0xffffa000013eea00
> data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
> owner=0xffffffff80bdd558, next=0xffffa000013c4c00, nextpkt=0x0
> leadingspace=0, trailingspace=400, readonly=0
> MBUF 0xffffa000013c4c00
> data=0xffffa000221e6000, len=8192, type=1,
> flags=0x0x4000001<EXT,EXT_ROMAP>
> owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
> leadingspace=0, trailingspace=0, readonly=1
> ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
> ext_free=0xffffffff80
> 4e6ca7, ext_arg=0xffffa00026119a70
>
>
> m_get() initializes the returned mbuf with m_next set to NULL.
> So when m_get() does m->m_next = NULL; the sending mbuf
> chain is this:
>
>
> db> show mbuf 0xffffa000013eea00
> MBUF 0xffffa000013eea00
> data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
> owner=0xffffffff80bdd558, next=0x0, nextpkt=0x0
> leadingspace=0, trailingspace=400, readonly=0
> db> show mbuf 0xffffa000013c4c00
> MBUF 0xffffa000013c4c00
> data=0xffffa000221e6000, len=8192, type=1,
> flags=0x0x4000001<EXT,EXT_ROMAP>
> owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
> leadingspace=0, trailingspace=0, readonly=1
> ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
> ext_free=0xffffffff80
> 4e6ca7, ext_arg=0xffffa00026119a70
>
>
> The second mbuf is lost, ext_free hook is never called
> to decrease the nwc_mbufcount.
ok, that mbuf is not lost, at least not in m_get().
I figured out m_ext_free() decreases ext_refcnt first.
This is what mreq contains at this point:
db> show mbuf /c 0xffffa00001203600
MBUF 0xffffa00001203600
data=0xffffa0000131f048, len=60, type=1,
flags=0x9000403<EXT,PKTHDR,CANFASTFWD
,EXT_CLUSTER,EXT_RW>
owner=0xffffffff80bd6500, next=0x0, nextpkt=0x0
leadingspace=72, trailingspace=1916, readonly=0
pktlen=164, rcvif=0xffffa000248f6010,
csum_flags=0x0x4b<TCPv4,UDPv4,DATA,IPv4>
, csum_data=0xffff, segsz=32136531
ext_refcnt=1, ext_buf=0xffffa0000131f000, ext_size=2048, ext_free=0x0,
ext_arg
=0xffffa0002320d3d0
I wish I would get some help/guidance in hunting down this
bug. The networking area is completely new to me. *sigh*
Christoph
From: Christoph Egger <Christoph_Egger%gmx.de@localhost>
To: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost
Cc: yamt%mwd.biglobe.ne.jp@localhost, rmind%netbsd.org@localhost,
enami%netbsd.org@localhost,
matt%netbsd.org@localhost
Subject: Re: kern/42455: tstile hang with nfs
Date: Fri, 05 Nov 2010 18:29:44 +0100
I have attached my current debug code.
When the bug hit I got this below.
Has anyone an idea what is going wrong?
Can anyone tell me how to proceed?
m_get1: m 0xffffa00000fa1000, mb 0xffffa00000fa1000, mb->m_next
0xffffa000013d5e00, mreq 0xffffa000012ce800
nfsm_disct1: mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000
nfs_writerpc: mbufcnt 2 mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000, mrep
0xffffa000012ce800, md 0xffffa00000fa1000
nfsmblk timeout, mbufcount 1, mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000,
mrep 0xffffa000012ce800, md 0xffffa00000fa1000
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff802021a5 cs e030 rflags 286 cr2 7f7ffdfdc000
cpl 0 rsp ffffa0002610d9d0
Stopped in pid 0.46 (system) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
nfs_writerpc() at netbsd:nfs_writerpc+0x1033
nfs_doio() at netbsd:nfs_doio+0x4d0
nfssvc_iod() at netbsd:nfssvc_iod+0x17b
ds 0
es 0
fs 0xe033
gs 0x1000
rdi 0
rsi 0xffffffff80ee9000
rbp 0xffffa0002610d9d0
rbx 0x20c49ba5e353f7cf
rdx 0xffffffff80b73008 cpu_info_primary+0x1c8
rcx 0
rax 0
r8 0x400
r9 0
r10 0xffffa0002610d9c0
r11 0xe033
r12 0x1
r13 0x1
r14 0x23
r15 0xffffa0002610dbc0
rip 0xffffffff802021a5 breakpoint+0x5
cs 0xe030
rflags 0x286
rsp 0xffffa0002610d9d0
ss 0xe02b
netbsd:breakpoint+0x5: leave
db> show mbuf /c 0xffffa00000fa1000
MBUF 0xffffa00000fa1000
data=0xffffa000011b5800, len=90, type=1,
flags=0x900010b<EXT,PKTHDR,PROTO1,BCA
ST,EXT_CLUSTER,EXT_RW>
owner=0xffffa000248f63b0, next=0x0, nextpkt=0xffffa000011a9200
leadingspace=0, trailingspace=1958, readonly=0
pktlen=90, rcvif=0xffffa000248f6010, csum_flags=0x0x0, csum_data=0xffff,
segsz
=32136531
ext_refcnt=1, ext_buf=0xffffa000011b5800, ext_size=2048, ext_free=0x0,
ext_arg
=0xffffa0002320d3d0
db> show mbuf /c 0xffffa00000fa1000
MBUF 0xffffa00000fa1000
data=0xffffa000011b5800, len=90, type=1,
flags=0x900010b<EXT,PKTHDR,PROTO1,BCA
ST,EXT_CLUSTER,EXT_RW>
owner=0xffffa000248f63b0, next=0x0, nextpkt=0xffffa000011a9200
leadingspace=0, trailingspace=1958, readonly=0
pktlen=90, rcvif=0xffffa000248f6010, csum_flags=0x0x0, csum_data=0xffff,
segsz
=32136531
ext_refcnt=1, ext_buf=0xffffa000011b5800, ext_size=2048, ext_free=0x0,
ext_arg
=0xffffa0002320d3d0
db> show mbuf /c 0xffffa000013d5e00
MBUF 0xffffa000013d5e00
data=0xffffa000221e6000, len=8192, type=1, flags=0x4000001<EXT,EXT_ROMAP>
owner=0xffffffff80bdafe0, next=0x0, nextpkt=0x0
leadingspace=0, trailingspace=0, readonly=1
ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
ext_free=0xffffffff80
4e5b6f, ext_arg=0xffffa0002610da70
db> show mbuf /c 0xffffa000012ce800
MBUF 0xffffa000012ce800
data=0xffffa00001201802, len=2046, type=1,
flags=0x9000003<EXT,PKTHDR,EXT_CLUS
TER,EXT_RW>
owner=0xffffffff80bdae58, next=0x0, nextpkt=0x0
leadingspace=2, trailingspace=0, readonly=0
pktlen=2046, rcvif=0x0, csum_flags=0x0x0, csum_data=0x0, segsz=32136531
ext_refcnt=1, ext_buf=0xffffa00001201800, ext_size=2048, ext_free=0x0,
ext_arg
=0xffffa0002320d3d0
db>
Home |
Main Index |
Thread Index |
Old Index