Subject: kern/15463: NFS client bug - incorrect caching of file-handles
To: None <gnats-bugs@gnats.netbsd.org>
From: None <matthew@debian.org>
List: netbsd-bugs
Date: 02/02/2002 17:39:41
>Number:         15463
>Category:       kern
>Synopsis:       nfs client incorrectly caches file handles, and erroniously returns EINVAL
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 02 09:40:01 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Matthew Vernon
>Release:        NetBSD 1.5.1
>Organization:
The Debian project. Also, the University of Cambridge
>Environment:
nfs server is a Linux user-land server (Debian GNU/Linux testing)
The FS was stressed, as software was being built on the nfs-mounted partition
on which the fs-errors occured
System: NetBSD cleopatra 1.5.2 NetBSD 1.5.2 (GENERIC) #3: Sat Aug 18 23:37:05 CEST 2001 he@hamster.urc.uninett.no:/usr/src/sys/arch/i386/compile/GENERIC i386


>Description:
This problem came to light as I was trying to build .debs on an nfs-mounted
partition; apparantly-random errors (symlinks not being recognised as such,
spurious EISDIRs and so on) were being reported by netbsd, which did not 
match the state of the fs (when checked on the server, or using a different 
client). Accordingly, I ran tcpdump, mounted the partition, started building
some software, and waited for things to break. Much of the analysis of the 
resulting tcpdump data was done by Ian Jackson:

Here is my analysis, as requested.  I looked at the output of
 /usr/sbin/tcpdump -uvvvvs500 -r ethcapture not port 500
and found that the value of the relevant NFS filehandle,
 7A7F03700650E7626DA575000000000000000000000000000000000000000000
which I searched for.

Summary: the client seems to have cached the fact that the fh used to
refer to a symlink and keeps trying to readlink it despite several
lookup and getattr calls and EINVALs from readlink telling it that
it's a file.


The detailed sequence of events is as follows:

At 14:35:45.097718 the client looks up the name `Makefile' in some
directory or other and gets given the fh in question; it's told it's a
symlink.  The client removes it straight away by making an NFS remove
call on the name `Makefile' in that directory.

At 14:36:28.799930 the client creates a file `md2_dgst.o' in some
other directory.  Evidently the server reuses the original inode
number at this point, because the client gets the fh in question back
from the create call.

At 14:36:28.859574 the client calls getattr on the fh; it's told the
it's a file.

At 14:36:28.879670 the client looks up `md2_dgst.o' again and again
gets given the same fh, with a note saying that it's a file.  But, the
client tries to call readlink on it, which fails with EINVAL.

At 14:37:22.586938 the client does the same again - looks it up, is
given the fh and told it's a file, makes a failing call to readlink.

At 14:39:49.724363 the client calls getattr on the fh.  It again gets
told it's a file.  It then calls readlink straight away, which fails.

At 14:39:51.425057 the client calls readlink again, which fails.
I think this corresponds to the error message on the client:
 cat: /debian/import/openssl-0.9.6c/crypto/md2/md2_dgst.o: Invalid

You say that now, nearly an hour later, it's still broken.  The
conclusion I come to is that NetBSD has improperly cached the fact
that the fh used to refer to a symlink, despite much time passing and
much evidence to the contrary.

Relevant tcpdump bits below.

Ian.

14:35:45.096982 cleopatra.empire.pick.ucam.org.2482867553 > ming.empire.pick.ucam.org.nfs: 132 lookup fh Unknown/797F03700550E7626DA500000000000000000000000000000000000000000000 "Makefile" (ttl 64, id 50606, len 160)
14:35:45.097718 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482867553: reply ok 128 lookup fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 LNK 120777 ids 4015/500 sz 12 nlink 1 rdev 847 fsid 1 nodeid 70037f7a a/m/ctime 1012487679.000000 1012487679.000000 1012487679.000000  (ttl 64, id 43090, len 156)
14:35:45.098680 cleopatra.empire.pick.ucam.org.2482867554 > ming.empire.pick.ucam.org.nfs: 132 lookup fh Unknown/797F03700550E7626DA500000000000000000000000000000000000000000000 "Makefile" (ttl 64, id 50607, len 160)
14:35:45.099028 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482867554: reply ok 128 lookup fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 LNK 120777 ids 4015/500 sz 12 nlink 1 rdev 847 fsid 1 nodeid 7
14:35:45.099729 cleopatra.empire.pick.ucam.org.2482867555 > ming.empire.pick.ucam.org.nfs: 132 remove fh Unknown/797F03700550E7626DA500000000000000000000000000000000000000000000 "Makefile" (ttl 64, id 50608, len 160)
14:35:45.100131 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482867555: reply ok 28 remove (ttl 64, id 43092, len 56)

14:36:28.796326 cleopatra.empire.pick.ucam.org.2482873186 > ming.empire.pick.ucam.org.nfs: 136 lookup fh Unknown/797F03700550E7626DA500000000000000000000000000000000000000000000 "md2_dgst.o" (ttl 64, id 56494, len 164)
14:36:28.796860 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873186: reply ok 28 lookup ERROR: No such file or directory (ttl 64, id 49113, len 56)
14:36:28.799265 cleopatra.empire.pick.ucam.org.2482873187 > ming.empire.pick.ucam.org.nfs: 168 create fh Unknown/797F03700550E7626DA500000000000000000000000000000000000000000000 "md2_dgst.o" (ttl 64, id 56495, len 196)
14:36:28.799930 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873187: reply ok 128 create fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 REG 100644 ids 0/0 sz 0 nlink 1 rdev 201 fsid 1 nodeid 70037f7a a/m/ctime 1012487788.000000 1012487788.000000 1012487788.000000  (ttl 64, id 49114, len 156)

14:36:28.859574 cleopatra.empire.pick.ucam.org.2482873191 > ming.empire.pick.ucam.org.nfs: 120 getattr fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 (ttl 64, id 56499, len 148)
14:36:28.860327 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873191: reply ok 96 getattr REG 100644 ids 0/0 sz 0  (ttl 64, id 49118, len 124)

14:36:28.879209 cleopatra.empire.pick.ucam.org.2482873196 > ming.empire.pick.ucam.org.nfs: 136 lookup fh Unknown/797F03700550E7626DA500000000000000000000000000000000000000000000 "md2_dgst.o" (ttl 64, id 56510, len 164)
14:36:28.879670 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873196: reply ok 128 lookup fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 REG 100644 ids 0/0 sz 0 nlink 1 rdev 201 fsid 1 nodeid 70037f7a a/m/ctime 1012487788.000000 1012487788.000000 1012487788.000000  (ttl 64, id 49123, len 156)
14:36:28.881779 cleopatra.empire.pick.ucam.org.2482873197 > ming.empire.pick.ucam.org.nfs: 120 readlink fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 (ttl 64, id 56511, len 148)
14:36:28.882129 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873197: reply ok 28 readlink ERROR: Invalid argument (ttl 64, id 49125, len 56)

14:37:22.586569 cleopatra.empire.pick.ucam.org.2482873261 > ming.empire.pick.ucam.org.nfs: 136 lookup fh Unknown/797F03700550E7626DA500000000000000000000000000000000000000000000 "md2_dgst.o" (ttl 64, id 56636, len 164)
14:37:22.586938 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873261: reply ok 128 lookup fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 REG 100644 ids 0/0 sz 0 nlink 1 rdev 201 fsid 1 nodeid 70037f7a a/m/ctime 1012487788.000000 1012487788.000000 1012487788.000000  (ttl 64, id 49411, len 156)
14:37:22.587631 cleopatra.empire.pick.ucam.org.2482873262 > ming.empire.pick.ucam.org.nfs: 120 readlink fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 (ttl 64, id 56637, len 148)
14:37:22.587912 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873262: reply ok 28 readlink ERROR: Invalid argument (ttl 64, id 49412, len 56)

14:39:49.724363 cleopatra.empire.pick.ucam.org.2482873296 > ming.empire.pick.ucam.org.nfs: 120 getattr fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 (ttl 64, id 56770, len 148)
14:39:49.724635 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873296: reply ok 96 getattr REG 100644 ids 0/0 sz 0  (ttl 64, id 50150, len 124)
14:39:49.725293 cleopatra.empire.pick.ucam.org.2482873297 > ming.empire.pick.ucam.org.nfs: 120 readlink fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 (ttl 64, id 56771, len 148)
14:39:49.725590 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873297: reply ok 28 readlink ERROR: Invalid argument (ttl 64, id 50151, len 56)
14:39:51.425057 cleopatra.empire.pick.ucam.org.2482873298 > ming.empire.pick.ucam.org.nfs: 120 readlink fh Unknown/7A7F03700650E7626DA575000000000000000000000000000000000000000000 (ttl 64, id 56775, len 148)
14:39:51.425452 ming.empire.pick.ucam.org.nfs > cleopatra.empire.pick.ucam.org.2482873298: reply ok 28 readlink ERROR: Invalid argument (ttl 64, id 50160, len 56)

>How-To-Repeat:
nfs-mount a partition from a linux server, and do something that involves a lot 
of creating and deleting files - such as building a large source package 
>Fix:
As noted above, the behaviour with regard to file-handles the client has cached
from nfs servers should be changed to behave more intelligently if the fh 
appears to no longer be what the client thought it was
>Release-Note:
>Audit-Trail:
>Unformatted: