tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: infinite recursion in m_split()



On Fri, Apr 03, 2009 at 01:22:30AM +0000, YAMAMOTO Takashi wrote:
> hi,
> 
> > Hi,
> > While debugging NFS server issues I found a infinite recursion in m_split
> > when it's called with len0 == 0.
> > it's called this way from nfsrv_getstream() (and this is even documented
> > in the comment :):
> >                 /*
> >                  * Now get the record part.
> >                  *
> >                  * Note that slp->ns_reclen may be 0.  Linux sometimes
> >                  * generates 0-length records.
> >                  */
> >                 if (slp->ns_cc == slp->ns_reclen) {
> >                         recm = slp->ns_raw;
> >                         slp->ns_raw = slp->ns_rawend = (struct mbuf *)0;
> >                         slp->ns_cc = slp->ns_reclen = 0;
> >                 } else if (slp->ns_cc > slp->ns_reclen) {
> >                         recm = slp->ns_raw;
> >                         m = m_split(recm, slp->ns_reclen, waitflag);
> > 
> > Then m_split() calls m_split0, which, if (m0->m_flags & M_PKTHDR)
> > is true and (m->m_flags & M_EXT) is false, will call m_split() with
> > the same mbuf and the same m0. This is an infinite loop (well, infinite
> > until stack is exhausted).
> 
> good catch.

Well, not so good. The conditions which cause this recursion should not
happen: If m == m0 (i.e. the loop at the start of m_split0() dind't
progress), and (m0->m_flags & M_PKTHDR) is true, then we either
have remain (= m0->m_len in this specific case) <= MHLEN or m0 has
an external storage. So the recursion happens when something wrong
already happened. The attached patch gives more details on how we get in
this situation:
m 0xc3bae000 m0 0xc3bae000 remain 1444 len_save -4 len0 0 len 0 m_len 1444 
MHLEN 200
panic: m_split0

This mbuf obviously has a cluster attached (gdb on the core dump confirmed),
yet it doesn't have M_EXT set (the (m->m_flags & M_EXT) goto didn't
catch it, and gdb on the core dump also confirms). So the mbuf got partially
updated before m_split() I also guess m0->m_pkthdr.len is not supposed
to be negative, yet we have len_save as -4 (I've also seen -252 or -260).
So something is putting wrong values in the mbuf before calling m_split()
on it, but I don't know where it happens.
The printf() I added to check for len0 == 0 at the top of m_split()
fires only once, so I guess when slp->ns_reclen == 0 damage has already
been done to the associated mbuf; it could be related to this
specific condition.

Any idea very welcome.

-- 
Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index