Subject: kern/20870: nfs root mapping sometimes fails
To: None <gnats-bugs@gnats.netbsd.org>
From: None <martin@duskware.de>
List: netbsd-bugs
Date: 03/24/2003 09:07:03
>Number:         20870
>Category:       kern
>Synopsis:       nfs root mapping sometimes fails
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 24 00:08:01 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Martin Husemann
>Release:        NetBSD 1.6P
>Organization:
	
>Environment:
System: NetBSD night-porter.duskware.de 1.6P NetBSD 1.6P (PORTER) #0: Sat Mar 8 09:11:04 CET 2003 martin@insomnia.duskware.de:/usr/src/sys/arch/i386/compile/PORTER i386
Architecture: i386
Machine: i386
>Description:

I use NFS mounts exported like this:

/usr -maproot=root -network 192.168.xxx.0 -mask 255.255.255.0

and sometimes the maproot=root part just fails. This shows up in varios ways,
for example not being able to create new files in directories where root would
be able to do so, and when I widen permissions the created file (as seen
from the server) looks like:

-rw-rw-r--  1 4294967294  user  0 Mar 24 09:00 blah

where it should be root and user. In this case the server is 386 (running a
few weeks old -current) and the client is sparc64 (running a few days old
-current) - just in case it matters. I think I've seen the failure between
two i386s too, but I'm not completely sure right now (I rarely use such
combinations a lot) and between sparc (client) and i386 (server).

When this happens, the effect is persistent until reboot. I did a tcpdump
from such a situation, here are the commands executed:

[/mp3] root@setting-sun > ll /mnt/home/martin/update-ce
-rwxr-----  1 martin  users  5085829120 Mar 20 16:39 /mnt/home/martin/update-ce*
[/mp3] root@setting-sun > tar xvpf /mnt/home/martin/update-ce
tar: Failed open to read on /mnt/home/martin/update-ce (Permission denied)

And here is the output of 
"tcpdump -vvv -s 2000 -l -i tlp0 host sun and port nfs" (setting-sun is
sparc64/client, night-porter is i386/server), the dump is from the server:

tcpdump: listening on tlp0
08:52:03.142221 setting-sun.duskware.de.3951722537 > night-porter.duskware.de.nfs: 124 access fh 0,4/1931 0002 (ttl 64, id 1495, len 152)
08:52:03.142359 night-porter.duskware.de.nfs > setting-sun.duskware.de.3951722537: reply ok 120 access attr: DIR 755 ids 0/0 sz 0x200 nlink 27 rdev 1/56 fsid 0x4 fileid 0x2 a/m/ctime 1048492199.886072000 1047203214.379436000 1047203214.379436000 c 0002 (ttl 64, id 18241, len 148)
08:52:03.142681 setting-sun.duskware.de.3951722538 > night-porter.duskware.de.nfs: 120 getattr fh 0,4/1931 (ttl 64, id 1496, len 148)
08:52:03.142761 night-porter.duskware.de.nfs > setting-sun.duskware.de.3951722538: reply ok 112 getattr DIR 755 ids 0/0 sz 0x200 (ttl 64, id 18242, len 140)
08:52:03.143025 setting-sun.duskware.de.3951722539 > night-porter.duskware.de.nfs: 124 access fh 0,4/1931 0002 (ttl 64, id 1497, len 152)
08:52:03.143099 night-porter.duskware.de.nfs > setting-sun.duskware.de.3951722539: reply ok 120 access attr: DIR 755 ids 0/0 sz 0x200 nlink 11 rdev 724/136 fsid 0x4 fileid 0xb241 a/m/ctime 1048492222.306675000 1046003101.862643000 1046003101.862643000 c 0002 (ttl 64, id 18243, len 148)
08:52:03.143337 setting-sun.duskware.de.3951722540 > night-porter.duskware.de.nfs: 120 getattr fh 0,4/1931 (ttl 64, id 1498, len 148)
08:52:03.143411 night-porter.duskware.de.nfs > setting-sun.duskware.de.3951722540: reply ok 112 getattr DIR 755 ids 205/100 sz 0x1800 (ttl 64, id 18244, len 140)
08:52:03.143679 setting-sun.duskware.de.3951722541 > night-porter.duskware.de.nfs: 124 access fh 0,4/1931 0002 (ttl 64, id 1499, len 152)
08:52:03.143751 night-porter.duskware.de.nfs > setting-sun.duskware.de.3951722541: reply ok 120 access attr: DIR 755 ids 205/100 sz 0x1800 nlink 103 rdev 459/184 fsid 0x4 fileid 0x2e94 a/m/ctime 1048492223.856717000 1048492152.464765000 1048492152.464765000 c 0002 (ttl 64, id 18245, len 148)
08:52:03.143984 setting-sun.duskware.de.3951722542 > night-porter.duskware.de.nfs: 120 getattr fh 0,4/1931 (ttl 64, id 1500, len 148)
08:52:03.144058 night-porter.duskware.de.nfs > setting-sun.duskware.de.3951722542: reply ok 112 getattr REG 740 ids 205/100 sz 0x12f239800 (ttl 64, id 18246, len 140)
08:52:10.498197 setting-sun.duskware.de.3951722543 > night-porter.duskware.de.nfs: 124 access fh 0,4/1931 0001 (ttl 64, id 1511, len 152)
08:52:10.498327 night-porter.duskware.de.nfs > setting-sun.duskware.de.3951722543: reply ok 120 access attr: REG 740 ids 205/100 sz 0x12f239800 nlink 1 rdev 191/64 fsid 0x4 fileid 0x2eb1 a/m/ctime 1048491890.526039000 1048174793.000000 1048491890.406034000 c 0000 (ttl 64, id 18262, len 148)

>How-To-Repeat:

I've seen this on various machines, happening from time to time.
Rebooting the NFS client seems to make the problem go away.

>Fix:
n/a
>Release-Note:
>Audit-Trail:
>Unformatted: