Subject: NFS directory listings sometimes corrupted (truncated)
To: None <current-users@netbsd.org>
From: Mark Davies <mark@mcs.vuw.ac.nz>
List: current-users
Date: 12/15/2003 18:38:18
I have a directory that often has around 2000 to 3000 files in it that is NFS 
served (v3 TCP) from a 1.6ZC i386 box to a 1.6ZF i386 box.  Sometimes the 
client gets into a state where a listing of the directory only shows 
approximately 300 of those files and remains in that state until something 
causes it to request another READDIR at which point it sees the full list 
again.

eg just now the directory had 2188 files in it as an "ls | wc -l" on the 
server indicated but on the client machine "ls | wc -l" returned 327.  
Removing one file on the server had both machines then agreeing that there were
2187 files in the directory.

I note from a tcpdump that the full ls listing requires 8 or 9 READDIR 
requests and the response to each is up to 6 packets long.  I'm not sure if the 
327 files equates to some meaningful subset of those responses.

I believe the 1.6ZF client post dates all the recent NFS patches.  The 1.6ZC 
server clearly doesn't but I'm not sure this is a server issue.  As this 
directory happens to be my mail inbox this behaviour is _very_ troubling.  I 
don't believe I saw this problem with a 1.6L server and 1.6W client but 
certainly did see it with a 1.6ZC client against the current server.

Any suggestions?

mark