[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
NFS server mbuf leak
I've a mbuf leak problem on a netbsd-3 NFS server. Clients are mostly
linux boxes, using default NFS mount values (I think it's TCP,
32k read/write). Here's what I think did trigger it:
I remplaced the hard disk of several partition, so the mount
point got a freshly-newfsed filesystem. NFS clients which had the partition
mounted at this time gets "stale NFS file handle error". One of the changed
partition holds user's home directoy, so there's a lot of requests for
the old FS. I started seeing the leak after NFS got enabled again, and
stopped once clients got rebooted.
Today, a client in suspend got woken up, and the leak started again.
I took time to investigate a bit (investigations got stopped by
the reboot of the NFS client by the user):
- MBUFTRACE confirms that the leak is in the NFS code. It's a mbuf+cluster
- I suspect the error causing the leak is (from tcpdump)
"reply ok 36 access ERROR: Stale NFS file handle"
but I got about 700 of these in one minutes, for only 60 mbufs leaked.
So it's not one mbuf leak per reply. The only other reply type I've
seen is "reply ok 32 getattr ERROR: Stale NFS file handle", but only 5 of
them in one minute, for 60 mbufs leak.
- I've not seen this in normal operations, even if there's lots of requests
for deleted files. So it could be related to the partition manipulations
I did on the server. Note that the server got rebooted several times
between the partitions changes and the last occurence of the problem,
so it's not caused by something stale on the server.
I looked at the source but didn't see anything obvious. The fact that
there's not a 1 for 1 correspondance between replies and lost mbufs makes me
think that there's another parameter that I didn't find yet ...
Any idea where to look at ?
Manuel Bouyer, LIP6, Universite Paris VI.
NetBSD: 26 ans d'experience feront toujours la difference
Main Index |
Thread Index |