NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/42455: tstile hang with nfs



The following reply was made to PR kern/42455; it has been noted by GNATS.

From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: kern/42455: tstile hang with nfs
Date: Mon, 8 Nov 2010 03:16:42 +0000

 (three messages not sent to gnats)
 
    ------
 
 From: Christoph Egger <Christoph_Egger%gmx.de@localhost>
 To: Christoph Egger <Christoph_Egger%gmx.de@localhost>
 Cc: netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        kern-bug-people%netbsd.org@localhost, yamt%mwd.biglobe.ne.jp@localhost,
        rmind%netbsd.org@localhost, enami%netsbd.org@localhost
 Subject: Re: kern/42455: tstile hang with nfs
 Date: Fri, 05 Nov 2010 10:13:39 +0100
 
 
 
 yamt: Your guess is right: There is an mbuf leak through
 the use of pool_cache(9) on 'mb_cache'.
 
 In nfsm_disct() at line 963 m_get() is called.
 
 m2 = m_get(M_WAIT, MT_DATA);  <-- line 963
 
 m_get() calls pool_cache_get().
 There is a race where pool_cache_get() returns an mbuf
 for the receiving mbuf chain that is still used
 in the sending mbuf chain.
 
 The sending mbuf chain is this (and nwc_mbufcount is 2):
 
 db> show mbuf 0xffffa000013eea00
 MBUF 0xffffa000013eea00
   data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
   owner=0xffffffff80bdd558, next=0xffffa000013c4c00, nextpkt=0x0
   leadingspace=0, trailingspace=400, readonly=0
 MBUF 0xffffa000013c4c00
   data=0xffffa000221e6000, len=8192, type=1, flags=0x0x4000001<EXT,EXT_ROMAP>
   owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
   leadingspace=0, trailingspace=0, readonly=1
   ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192, 
ext_free=0xffffffff80
 4e6ca7, ext_arg=0xffffa00026119a70
 
 
 m_get() initializes the returned mbuf with m_next set to NULL.
 So when m_get() does m->m_next = NULL; the sending mbuf
 chain is this:
 
 
 db> show mbuf 0xffffa000013eea00
 MBUF 0xffffa000013eea00
   data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
   owner=0xffffffff80bdd558, next=0x0, nextpkt=0x0
   leadingspace=0, trailingspace=400, readonly=0
 db> show mbuf 0xffffa000013c4c00
 MBUF 0xffffa000013c4c00
   data=0xffffa000221e6000, len=8192, type=1, flags=0x0x4000001<EXT,EXT_ROMAP>
   owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
   leadingspace=0, trailingspace=0, readonly=1
   ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192, 
ext_free=0xffffffff80
 4e6ca7, ext_arg=0xffffa00026119a70
 
 
 The second mbuf is lost, ext_free hook is never called
 to decrease the nwc_mbufcount.
 
 Christoph
 
 
 From: Christoph Egger <Christoph_Egger%gmx.de@localhost>
 To: netbsd-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        kern-bug-people%netbsd.org@localhost
 Cc: enami%netbsd.org@localhost, rmind%netbsd.org@localhost, 
yamt%mwd.biglobe.ne.jp@localhost,
        kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        netbsd-bugs%netbsd.org@localhost
 Subject: Re: kern/42455: tstile hang with nfs
 Date: Fri, 05 Nov 2010 14:27:19 +0100
 
 
 > 
 > yamt: Your guess is right: There is an mbuf leak through
 > the use of pool_cache(9) on 'mb_cache'.
 > 
 > In nfsm_disct() at line 963 m_get() is called.
 > 
 > m2 = m_get(M_WAIT, MT_DATA);  <-- line 963
 > 
 > m_get() calls pool_cache_get().
 > There is a race where pool_cache_get() returns an mbuf
 > for the receiving mbuf chain that is still used
 > in the sending mbuf chain.
 > 
 > The sending mbuf chain is this (and nwc_mbufcount is 2):
 > 
 > db> show mbuf 0xffffa000013eea00
 > MBUF 0xffffa000013eea00
 >   data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
 >   owner=0xffffffff80bdd558, next=0xffffa000013c4c00, nextpkt=0x0
 >   leadingspace=0, trailingspace=400, readonly=0
 > MBUF 0xffffa000013c4c00
 >   data=0xffffa000221e6000, len=8192, type=1,
 > flags=0x0x4000001<EXT,EXT_ROMAP>
 >   owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
 >   leadingspace=0, trailingspace=0, readonly=1
 >   ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
 > ext_free=0xffffffff80
 > 4e6ca7, ext_arg=0xffffa00026119a70
 > 
 > 
 > m_get() initializes the returned mbuf with m_next set to NULL.
 > So when m_get() does m->m_next = NULL; the sending mbuf
 > chain is this:
 > 
 > 
 > db> show mbuf 0xffffa000013eea00
 > MBUF 0xffffa000013eea00
 >   data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
 >   owner=0xffffffff80bdd558, next=0x0, nextpkt=0x0
 >   leadingspace=0, trailingspace=400, readonly=0
 > db> show mbuf 0xffffa000013c4c00
 > MBUF 0xffffa000013c4c00
 >   data=0xffffa000221e6000, len=8192, type=1,
 > flags=0x0x4000001<EXT,EXT_ROMAP>
 >   owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
 >   leadingspace=0, trailingspace=0, readonly=1
 >   ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
 > ext_free=0xffffffff80
 > 4e6ca7, ext_arg=0xffffa00026119a70
 > 
 > 
 > The second mbuf is lost, ext_free hook is never called
 > to decrease the nwc_mbufcount.
 
 ok, that mbuf is not lost, at least not in m_get().
 I figured out m_ext_free() decreases ext_refcnt first.
 
 This is what mreq contains at this point:
 db> show mbuf /c 0xffffa00001203600
 MBUF 0xffffa00001203600
   data=0xffffa0000131f048, len=60, type=1, 
flags=0x9000403<EXT,PKTHDR,CANFASTFWD
 ,EXT_CLUSTER,EXT_RW>
   owner=0xffffffff80bd6500, next=0x0, nextpkt=0x0
   leadingspace=72, trailingspace=1916, readonly=0
   pktlen=164, rcvif=0xffffa000248f6010, 
csum_flags=0x0x4b<TCPv4,UDPv4,DATA,IPv4>
 , csum_data=0xffff, segsz=32136531
   ext_refcnt=1, ext_buf=0xffffa0000131f000, ext_size=2048, ext_free=0x0, 
ext_arg
 =0xffffa0002320d3d0
 
 
 I wish I would get some help/guidance in hunting down this
 bug. The networking area is completely new to me. *sigh*
 
 Christoph
 
 
 
 From: Christoph Egger <Christoph_Egger%gmx.de@localhost>
 To: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        netbsd-bugs%netbsd.org@localhost
 Cc: yamt%mwd.biglobe.ne.jp@localhost, rmind%netbsd.org@localhost, 
enami%netbsd.org@localhost,
        matt%netbsd.org@localhost
 Subject: Re: kern/42455: tstile hang with nfs
 Date: Fri, 05 Nov 2010 18:29:44 +0100
 
 
 I have attached my current debug code.
 When the bug hit I got this below.
 
 Has anyone an idea what is going wrong?
 Can anyone tell me how to proceed?
 
 
 m_get1: m 0xffffa00000fa1000, mb 0xffffa00000fa1000, mb->m_next 
0xffffa000013d5e00, mreq 0xffffa000012ce800
 nfsm_disct1: mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000
 nfs_writerpc: mbufcnt 2 mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000, mrep 
0xffffa000012ce800, md 0xffffa00000fa1000
 nfsmblk timeout, mbufcount 1, mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000, 
mrep 0xffffa000012ce800, md 0xffffa00000fa1000
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip ffffffff802021a5 cs e030 rflags 286 cr2  7f7ffdfdc000 
cpl 0 rsp ffffa0002610d9d0
 Stopped in pid 0.46 (system) at netbsd:breakpoint+0x5:  leave
 breakpoint() at netbsd:breakpoint+0x5
 nfs_writerpc() at netbsd:nfs_writerpc+0x1033
 nfs_doio() at netbsd:nfs_doio+0x4d0
 nfssvc_iod() at netbsd:nfssvc_iod+0x17b
 ds          0
 es          0
 fs          0xe033
 gs          0x1000
 rdi         0
 rsi         0xffffffff80ee9000
 rbp         0xffffa0002610d9d0
 rbx         0x20c49ba5e353f7cf
 rdx         0xffffffff80b73008  cpu_info_primary+0x1c8
 rcx         0
 rax         0
 r8          0x400
 r9          0
 r10         0xffffa0002610d9c0
 r11         0xe033
 r12         0x1
 r13         0x1
 r14         0x23
 r15         0xffffa0002610dbc0
 rip         0xffffffff802021a5  breakpoint+0x5
 cs          0xe030
 rflags      0x286
 rsp         0xffffa0002610d9d0
 ss          0xe02b
 netbsd:breakpoint+0x5:  leave
 db> show mbuf /c 0xffffa00000fa1000
 MBUF 0xffffa00000fa1000
   data=0xffffa000011b5800, len=90, type=1, 
flags=0x900010b<EXT,PKTHDR,PROTO1,BCA
 ST,EXT_CLUSTER,EXT_RW>
   owner=0xffffa000248f63b0, next=0x0, nextpkt=0xffffa000011a9200
   leadingspace=0, trailingspace=1958, readonly=0
   pktlen=90, rcvif=0xffffa000248f6010, csum_flags=0x0x0, csum_data=0xffff, 
segsz
 =32136531
   ext_refcnt=1, ext_buf=0xffffa000011b5800, ext_size=2048, ext_free=0x0, 
ext_arg
 =0xffffa0002320d3d0
 db> show mbuf /c 0xffffa00000fa1000
 MBUF 0xffffa00000fa1000
   data=0xffffa000011b5800, len=90, type=1, 
flags=0x900010b<EXT,PKTHDR,PROTO1,BCA
 ST,EXT_CLUSTER,EXT_RW>
   owner=0xffffa000248f63b0, next=0x0, nextpkt=0xffffa000011a9200
   leadingspace=0, trailingspace=1958, readonly=0
   pktlen=90, rcvif=0xffffa000248f6010, csum_flags=0x0x0, csum_data=0xffff, 
segsz
 =32136531
   ext_refcnt=1, ext_buf=0xffffa000011b5800, ext_size=2048, ext_free=0x0, 
ext_arg
 =0xffffa0002320d3d0
 db> show mbuf /c 0xffffa000013d5e00
 MBUF 0xffffa000013d5e00
   data=0xffffa000221e6000, len=8192, type=1, flags=0x4000001<EXT,EXT_ROMAP>
   owner=0xffffffff80bdafe0, next=0x0, nextpkt=0x0
   leadingspace=0, trailingspace=0, readonly=1
   ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192, 
ext_free=0xffffffff80
 4e5b6f, ext_arg=0xffffa0002610da70
 db> show mbuf /c 0xffffa000012ce800
 MBUF 0xffffa000012ce800
   data=0xffffa00001201802, len=2046, type=1, 
flags=0x9000003<EXT,PKTHDR,EXT_CLUS
 TER,EXT_RW>
   owner=0xffffffff80bdae58, next=0x0, nextpkt=0x0
   leadingspace=2, trailingspace=0, readonly=0
   pktlen=2046, rcvif=0x0, csum_flags=0x0x0, csum_data=0x0, segsz=32136531
   ext_refcnt=1, ext_buf=0xffffa00001201800, ext_size=2048, ext_free=0x0, 
ext_arg
 =0xffffa0002320d3d0
 db> 
 
 


Home | Main Index | Thread Index | Old Index