NFS server mbuf leak

To: tech-kern%netbsd.org@localhost
Subject: NFS server mbuf leak
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Mon, 5 Jan 2009 15:26:31 +0100

Hi,
I've a mbuf leak problem on a netbsd-3 NFS server. Clients are mostly
linux boxes, using default NFS mount values (I think it's TCP,
32k read/write). Here's what I think did trigger it:
I remplaced the hard disk of several partition, so the mount
point got a freshly-newfsed filesystem. NFS clients which had the partition
mounted at this time gets "stale NFS file handle error". One of the changed
partition holds user's home directoy, so there's a lot of requests for
the old FS. I started seeing the leak after NFS got enabled again, and
stopped once clients got rebooted.

Today, a client in suspend got woken up, and the leak started again.
I took time to investigate a bit (investigations got stopped by
the reboot of the NFS client by the user):
- MBUFTRACE confirms that the leak is in the NFS code. It's a mbuf+cluster
  leak.
- I suspect the error causing the leak is (from tcpdump)
  "reply ok 36 access ERROR: Stale NFS file handle"
  but I got about 700 of these in one minutes, for only 60 mbufs leaked.
  So it's not one mbuf leak per reply. The only other reply type I've
  seen is "reply ok 32 getattr ERROR: Stale NFS file handle", but only 5 of
  them in one minute, for 60 mbufs leak.
- I've not seen this in normal operations, even if there's lots of requests
 for deleted files. So it could be related to the partition manipulations
 I did on the server. Note that the server got rebooted several times
 between the partitions changes and the last occurence of the problem,
 so it's not caused by something stale on the server.

I looked at the source but didn't see anything obvious. The fact that
there's not a 1 for 1 correspondance between replies and lost mbufs makes me
think that there's another parameter that I didn't find yet ...

Any idea where to look at ?

-- 
Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
     NetBSD: 26 ans d'experience feront toujours la difference
--

Follow-Ups:
- Re: NFS server mbuf leak
  - From: Antti Kantee

Prev by Date: Re: cache invalidation in modload
Next by Date: Re: NFS server mbuf leak
Previous by Thread: Expected msgrcv(2) behaviour with msgsz > SSIZE_MAX ?
Next by Thread: Re: NFS server mbuf leak
Indexes:

Home | Main Index | Thread Index | Old Index