tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NFS server mbuf leak



On Mon Jan 05 2009 at 15:26:31 +0100, Manuel Bouyer wrote:
> Hi,
> I've a mbuf leak problem on a netbsd-3 NFS server. Clients are mostly
> linux boxes, using default NFS mount values (I think it's TCP,
> 32k read/write). Here's what I think did trigger it:
> I remplaced the hard disk of several partition, so the mount
> point got a freshly-newfsed filesystem. NFS clients which had the partition
> mounted at this time gets "stale NFS file handle error". One of the changed
> partition holds user's home directoy, so there's a lot of requests for
> the old FS. I started seeing the leak after NFS got enabled again, and
> stopped once clients got rebooted.
> 
> Today, a client in suspend got woken up, and the leak started again.
> I took time to investigate a bit (investigations got stopped by
> the reboot of the NFS client by the user):
> - MBUFTRACE confirms that the leak is in the NFS code. It's a mbuf+cluster
>   leak.
> - I suspect the error causing the leak is (from tcpdump)
>   "reply ok 36 access ERROR: Stale NFS file handle"
>   but I got about 700 of these in one minutes, for only 60 mbufs leaked.
>   So it's not one mbuf leak per reply. The only other reply type I've
>   seen is "reply ok 32 getattr ERROR: Stale NFS file handle", but only 5 of
>   them in one minute, for 60 mbufs leak.
> - I've not seen this in normal operations, even if there's lots of requests
>  for deleted files. So it could be related to the partition manipulations
>  I did on the server. Note that the server got rebooted several times
>  between the partitions changes and the last occurence of the problem,
>  so it's not caused by something stale on the server.
> 
> I looked at the source but didn't see anything obvious. The fact that
> there's not a 1 for 1 correspondance between replies and lost mbufs makes me
> think that there's another parameter that I didn't find yet ...
> 
> Any idea where to look at ?

Since it's netbsd-3, try if revs 1.80 *and* 1.139 of nfs_syscalls.c help.


Home | Main Index | Thread Index | Old Index