NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/42455: tstile hang with nfs



The following reply was made to PR kern/42455; it has been noted by GNATS.

From: Christoph Egger <Christoph_Egger%gmx.de@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: YAMAMOTO Takashi <yamt%mwd.biglobe.ne.jp@localhost>, 
 kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, 
 netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/42455: tstile hang with nfs
Date: Thu, 28 Oct 2010 08:27:25 +0200

 On 28.10.10 07:45, YAMAMOTO Takashi wrote:
 >  >>  >>  >>  >  I added some more debug lines and figured out that the macro
 >  >>  >>  >>  >  nfsm_wcc_data() drops the mbuf chain w/o decreasing
 >  >>  >>  >>  >  ctxt.nwc_mbufcount.
 >  >>  >>  >>  
 >  >>  >>  >>  The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
 >  >>  >>  >>  The nfsm_postop_attr() macro calls nfsm_loadattrcache() 
 > function.
 >  >>  >>  >>  The nfsm_loadattrcache() function calls nfsm_disct() function.
 >  >>  >>  >>  
 >  >>  >>  >>  nfsm_disct() is the function in error which drops the mbuf 
 > chain.
 >  >>  >>  
 >  >>  >>  are you sure?
 >  >>  > 
 >  >>  > yes, absolutely and reproducable.
 >  >>  > 
 >  >>  >>  iirc, nwc_mbufcount is about sending mbuf.  otoh, nfsm_disct
 >  >>  >>  is for received mbuf.
 >  >>  > 
 >  >>  > nfs_writerpc *does* call nfsm_disct() through nfsm_wcc_data,
 >  >>  > nfsm_postop_attr and nfsm_loadattrcache in this order.
 >  >>  > 
 >  >>  > So you are saying this should never happen?
 >  >>  
 >  >>  i'm saying i don't understand.
 >  >>  
 >  >>  nfs_writerpc sends a request to the server, using mreq and mb.
 >  >>  it's what nwc_mbufcount is used for.
 >  >>  
 >  >>  it then parses the reply from the server, using mrep and md.
 >  >>  it's what nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct 
 > are
 >  >>  used for.
 >  > 
 >  > Ah, I see.
 >  > 
 >  >>  i don't understand how a problem in the latter causes the nwc_mbufcount
 >  >>  problem.  the above two are somehow mixed up?
 >  > 
 >  > nfsm_disct() creates new mbufs with m_get() and MCLAIM().
 >  > nfs_writerpc() relies on that the ext hook is called on m_free.
 >  > 
 >  > But nfsm_disct() does *not* use MEXTADD(), so the ext hook is empty.
 >  > => nfs_writerpc_extfree() won't be called to decrement nwc_mbufcount
 >  
 >  how is it a problem?  nwc_mbufcount is not incremented for the mbuf
 >  allocated by nfsm_disct.
 >  
 >  > => nfs_writerpc() calls cv_wait() which waits forever.
 >  
 >  it waits for the sending mbuf chain being consumed.  it's a separate mbuf
 >  chain from the one nfsm_disct works on.
 
 Or at least it should be a separate mbuf chain.
 
 Per suggestion of rmind@ I looked at the sending mbuf chain life cycle
 which is 'mb' used in nfs_writerpc().
 
 I looked at it by duplicating
 nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct, renamed
 them to
 
 nfsm_wcc_data1/nfsm_postop_attr1/nfsm_loadattrcache1/nfsm_disct1
 
 in my local tree.
 I extended them by passing nwc_mbufcount and 'mb' as arguments.
 
 I check for the condition where nwc_mbufcount is >=2 and mb->m_next
 becomes magically NULL.
 
 And this happens magically in nfsm_disct() at line 968 as
 I already reported to this PR.
 
 Christoph
 


Home | Main Index | Thread Index | Old Index