tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NFS server mbuf leak



On Mon, Jan 05, 2009 at 04:33:29PM +0200, Antti Kantee wrote:
> > Today, a client in suspend got woken up, and the leak started again.
> > I took time to investigate a bit (investigations got stopped by
> > the reboot of the NFS client by the user):
> > - MBUFTRACE confirms that the leak is in the NFS code. It's a mbuf+cluster
> >   leak.
> > - I suspect the error causing the leak is (from tcpdump)
> >   "reply ok 36 access ERROR: Stale NFS file handle"
> >   but I got about 700 of these in one minutes, for only 60 mbufs leaked.
> >   So it's not one mbuf leak per reply. The only other reply type I've
> >   seen is "reply ok 32 getattr ERROR: Stale NFS file handle", but only 5 of
> >   them in one minute, for 60 mbufs leak.
> > - I've not seen this in normal operations, even if there's lots of requests
> >  for deleted files. So it could be related to the partition manipulations
> >  I did on the server. Note that the server got rebooted several times
> >  between the partitions changes and the last occurence of the problem,
> >  so it's not caused by something stale on the server.
> > 
> > I looked at the source but didn't see anything obvious. The fact that
> > there's not a 1 for 1 correspondance between replies and lost mbufs makes me
> > think that there's another parameter that I didn't find yet ...
> > 
> > Any idea where to look at ?
> 
> Since it's netbsd-3, try if revs 1.80 *and* 1.139 of nfs_syscalls.c help.

thanks ! A new kernel is in place, but as I believe all systems that did have
the old disks mounted have been rebooted now, it'll be hard to make sure
this fixes the issue ...

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index