NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/42455: tstile hang with nfs



On 28.10.10 07:45, YAMAMOTO Takashi wrote:
>  >>  >>  >>  >  I added some more debug lines and figured out that the macro
>  >>  >>  >>  >  nfsm_wcc_data() drops the mbuf chain w/o decreasing
>  >>  >>  >>  >  ctxt.nwc_mbufcount.
>  >>  >>  >>  
>  >>  >>  >>  The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
>  >>  >>  >>  The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
>  >>  >>  >>  The nfsm_loadattrcache() function calls nfsm_disct() function.
>  >>  >>  >>  
>  >>  >>  >>  nfsm_disct() is the function in error which drops the mbuf chain.
>  >>  >>  
>  >>  >>  are you sure?
>  >>  > 
>  >>  > yes, absolutely and reproducable.
>  >>  > 
>  >>  >>  iirc, nwc_mbufcount is about sending mbuf.  otoh, nfsm_disct
>  >>  >>  is for received mbuf.
>  >>  > 
>  >>  > nfs_writerpc *does* call nfsm_disct() through nfsm_wcc_data,
>  >>  > nfsm_postop_attr and nfsm_loadattrcache in this order.
>  >>  > 
>  >>  > So you are saying this should never happen?
>  >>  
>  >>  i'm saying i don't understand.
>  >>  
>  >>  nfs_writerpc sends a request to the server, using mreq and mb.
>  >>  it's what nwc_mbufcount is used for.
>  >>  
>  >>  it then parses the reply from the server, using mrep and md.
>  >>  it's what nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct 
> are
>  >>  used for.
>  > 
>  > Ah, I see.
>  > 
>  >>  i don't understand how a problem in the latter causes the nwc_mbufcount
>  >>  problem.  the above two are somehow mixed up?
>  > 
>  > nfsm_disct() creates new mbufs with m_get() and MCLAIM().
>  > nfs_writerpc() relies on that the ext hook is called on m_free.
>  > 
>  > But nfsm_disct() does *not* use MEXTADD(), so the ext hook is empty.
>  > => nfs_writerpc_extfree() won't be called to decrement nwc_mbufcount
>  
>  how is it a problem?  nwc_mbufcount is not incremented for the mbuf
>  allocated by nfsm_disct.
>  
>  > => nfs_writerpc() calls cv_wait() which waits forever.
>  
>  it waits for the sending mbuf chain being consumed.  it's a separate mbuf
>  chain from the one nfsm_disct works on.

Or at least it should be a separate mbuf chain.

Per suggestion of rmind@ I looked at the sending mbuf chain life cycle
which is 'mb' used in nfs_writerpc().

I looked at it by duplicating
nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct, renamed
them to

nfsm_wcc_data1/nfsm_postop_attr1/nfsm_loadattrcache1/nfsm_disct1

in my local tree.
I extended them by passing nwc_mbufcount and 'mb' as arguments.

I check for the condition where nwc_mbufcount is >=2 and mb->m_next
becomes magically NULL.

And this happens magically in nfsm_disct() at line 968 as
I already reported to this PR.

Christoph


Home | Main Index | Thread Index | Old Index