NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/42455: tstile hang with nfs
On 28.10.10 07:45, YAMAMOTO Takashi wrote:
> >> >> >> > I added some more debug lines and figured out that the macro
> >> >> >> > nfsm_wcc_data() drops the mbuf chain w/o decreasing
> >> >> >> > ctxt.nwc_mbufcount.
> >> >> >>
> >> >> >> The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
> >> >> >> The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
> >> >> >> The nfsm_loadattrcache() function calls nfsm_disct() function.
> >> >> >>
> >> >> >> nfsm_disct() is the function in error which drops the mbuf chain.
> >> >>
> >> >> are you sure?
> >> >
> >> > yes, absolutely and reproducable.
> >> >
> >> >> iirc, nwc_mbufcount is about sending mbuf. otoh, nfsm_disct
> >> >> is for received mbuf.
> >> >
> >> > nfs_writerpc *does* call nfsm_disct() through nfsm_wcc_data,
> >> > nfsm_postop_attr and nfsm_loadattrcache in this order.
> >> >
> >> > So you are saying this should never happen?
> >>
> >> i'm saying i don't understand.
> >>
> >> nfs_writerpc sends a request to the server, using mreq and mb.
> >> it's what nwc_mbufcount is used for.
> >>
> >> it then parses the reply from the server, using mrep and md.
> >> it's what nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct
> are
> >> used for.
> >
> > Ah, I see.
> >
> >> i don't understand how a problem in the latter causes the nwc_mbufcount
> >> problem. the above two are somehow mixed up?
> >
> > nfsm_disct() creates new mbufs with m_get() and MCLAIM().
> > nfs_writerpc() relies on that the ext hook is called on m_free.
> >
> > But nfsm_disct() does *not* use MEXTADD(), so the ext hook is empty.
> > => nfs_writerpc_extfree() won't be called to decrement nwc_mbufcount
>
> how is it a problem? nwc_mbufcount is not incremented for the mbuf
> allocated by nfsm_disct.
>
> > => nfs_writerpc() calls cv_wait() which waits forever.
>
> it waits for the sending mbuf chain being consumed. it's a separate mbuf
> chain from the one nfsm_disct works on.
Or at least it should be a separate mbuf chain.
Per suggestion of rmind@ I looked at the sending mbuf chain life cycle
which is 'mb' used in nfs_writerpc().
I looked at it by duplicating
nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct, renamed
them to
nfsm_wcc_data1/nfsm_postop_attr1/nfsm_loadattrcache1/nfsm_disct1
in my local tree.
I extended them by passing nwc_mbufcount and 'mb' as arguments.
I check for the condition where nwc_mbufcount is >=2 and mb->m_next
becomes magically NULL.
And this happens magically in nfsm_disct() at line 968 as
I already reported to this PR.
Christoph
Home |
Main Index |
Thread Index |
Old Index