NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/42455: tstile hang with nfs



> 
> yamt: Your guess is right: There is an mbuf leak through
> the use of pool_cache(9) on 'mb_cache'.
> 
> In nfsm_disct() at line 963 m_get() is called.
> 
> m2 = m_get(M_WAIT, MT_DATA);  <-- line 963
> 
> m_get() calls pool_cache_get().
> There is a race where pool_cache_get() returns an mbuf
> for the receiving mbuf chain that is still used
> in the sending mbuf chain.
> 
> The sending mbuf chain is this (and nwc_mbufcount is 2):
> 
> db> show mbuf 0xffffa000013eea00
> MBUF 0xffffa000013eea00
>   data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
>   owner=0xffffffff80bdd558, next=0xffffa000013c4c00, nextpkt=0x0
>   leadingspace=0, trailingspace=400, readonly=0
> MBUF 0xffffa000013c4c00
>   data=0xffffa000221e6000, len=8192, type=1,
> flags=0x0x4000001<EXT,EXT_ROMAP>
>   owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
>   leadingspace=0, trailingspace=0, readonly=1
>   ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
> ext_free=0xffffffff80
> 4e6ca7, ext_arg=0xffffa00026119a70
> 
> 
> m_get() initializes the returned mbuf with m_next set to NULL.
> So when m_get() does m->m_next = NULL; the sending mbuf
> chain is this:
> 
> 
> db> show mbuf 0xffffa000013eea00
> MBUF 0xffffa000013eea00
>   data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
>   owner=0xffffffff80bdd558, next=0x0, nextpkt=0x0
>   leadingspace=0, trailingspace=400, readonly=0
> db> show mbuf 0xffffa000013c4c00
> MBUF 0xffffa000013c4c00
>   data=0xffffa000221e6000, len=8192, type=1,
> flags=0x0x4000001<EXT,EXT_ROMAP>
>   owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
>   leadingspace=0, trailingspace=0, readonly=1
>   ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
> ext_free=0xffffffff80
> 4e6ca7, ext_arg=0xffffa00026119a70
> 
> 
> The second mbuf is lost, ext_free hook is never called
> to decrease the nwc_mbufcount.

ok, that mbuf is not lost, at least not in m_get().
I figured out m_ext_free() decreases ext_refcnt first.

This is what mreq contains at this point:
db> show mbuf /c 0xffffa00001203600
MBUF 0xffffa00001203600
  data=0xffffa0000131f048, len=60, type=1, flags=0x9000403<EXT,PKTHDR,CANFASTFWD
,EXT_CLUSTER,EXT_RW>
  owner=0xffffffff80bd6500, next=0x0, nextpkt=0x0
  leadingspace=72, trailingspace=1916, readonly=0
  pktlen=164, rcvif=0xffffa000248f6010, csum_flags=0x0x4b<TCPv4,UDPv4,DATA,IPv4>
, csum_data=0xffff, segsz=32136531
  ext_refcnt=1, ext_buf=0xffffa0000131f000, ext_size=2048, ext_free=0x0, ext_arg
=0xffffa0002320d3d0


I wish I would get some help/guidance in hunting down this
bug. The networking area is completely new to me. *sigh*

Christoph


Home | Main Index | Thread Index | Old Index