Subject: NFS server ignoring writes?
To: None <current-users@netbsd.org>
From: Brian C. Grayson <bgrayson@marvin.ece.utexas.edu>
List: current-users
Date: 11/14/1998 00:13:40
We have an NFS server running NetBSD-1.3F (from July) on a
P-II called `latte', and a FreeBSD-3.0-19981103-SNAP NS client
called `aulait'. The client can do access, lookup, getattr RPC
fine. However, when the client tries to deliver mail to root
from a cron job, it tries to append the mail to /var/mail/root
(which is the NFS filesystem). The writes are seen by the
server, but it never sends any acks, so the client process gets
stuck, and all future NFS I/O by the client is stuck. I'm
enclosing tcpdump -vv -l output below, which was gathered on
the NetBSD server.
The server is latte, the client is aulait. One thing that
strikes me is that, when writing 8192 bytes, all of the frags
of the request say @0+, rather than @0+, @1480+, @1960+, etc.
Is this a bug on FreeBSD's fault, or just misinterpretation by
tcpdump, or lack of understanding on my part? I've added some
commentary marked by ****. When I have physical access to the
machine again, I will try mounting with wsize=512 to see if
that helps, but that may not be for a few days.
**** lookup and open /var/mail/root
20:28:20.185063 aulait.998563908 > latte.nfs: 104 lookup fh 4,0/1931 "root" (ttl 64, id 140)
20:28:20.185189 latte.nfs > aulait.998563908: reply ok 236 lookup fh 4,0/1931 REG 600 ids 0/0 sz 516096 nlink 1 rdev 691/184 fsid 2b3000000b8 nodeid b800000000 a/m/ctime 911010008.79877000 911006278.000000 911009944.279317000 post dattr: DIR 1777 ids 0/0 sz 512 nlink 3 rdev 690/239 fsid 2b2000000ef nodeid ef00000000 a/m/ctime 911010032.660117000 911010500.164397000 911010500.164397000 (ttl 64, id 40677)
20:28:20.185673 aulait.998563909 > latte.nfs: 100 access fh 4,0/1931 0002 (ttl 64, id 141)
20:28:20.185778 latte.nfs > aulait.998563909: reply ok 120 access attr: DIR 1777 ids 0/0 sz 512 nlink 3 rdev 690/239 fsid 2b2000000ef nodeid ef00000000 a/m/ctime 911010032.660117000 911010500.164397000 911010500.164397000 c 0002 (ttl 64, id 40678)
20:28:20.186119 aulait.998563910 > latte.nfs: 100 access fh 4,0/1931 000c (ttl 64, id 142)
20:28:20.186218 latte.nfs > aulait.998563910: reply ok 120 access attr: REG 600 ids 0/0 sz 516096 nlink 1 rdev 691/184 fsid 2b3000000b8 nodeid b800000000 a/m/ctime 911010008.79877000 911006278.000000 911009944.279317000 c 000c (ttl 64, id 40679)
**** get current size, presumably so that it can seek to the
**** EOF. Don't know why it does the same thing twice....
20:28:20.186628 aulait.998563911 > latte.nfs: 96 getattr fh 4,0/1931 (ttl 64, id 143)
20:28:20.186725 latte.nfs > aulait.998563911: reply ok 112 getattr REG 600 ids 0/0 sz 516096 (ttl 64, id 40680)
20:28:20.187367 aulait.998563912 > latte.nfs: 96 getattr fh 4,0/1931 (ttl 64, id 144)
20:28:20.187463 latte.nfs > aulait.998563912: reply ok 112 getattr REG 600 ids 0/0 sz 516096 (ttl 64, id 40681)
**** Issue the writes. Notice that the second, third etc
**** frags aren't specified to be at 1480, 2960, ...
20:28:20.188513 aulait.998563913 > latte.nfs: 1472 write fh 4,0/1931 8192 bytes @ 516096 <unstable> (frag 145:1480@0+) (ttl 64)
20:28:22.067087 aulait.998563913 > latte.nfs: 1472 write fh 4,0/1931 8192 bytes @ 516096 <unstable> (frag 146:1480@0+) (ttl 64)
20:28:25.817010 aulait.998563913 > latte.nfs: 1472 write fh 4,0/1931 8192 bytes @ 516096 <unstable> (frag 147:1480@0+) (ttl 64)
20:28:33.307100 aulait.998563913 > latte.nfs: 1472 write fh 4,0/1931 8192 bytes @ 516096 <unstable> (frag 148:1480@0+) (ttl 64)
20:28:48.277230 aulait.998563913 > latte.nfs: 1472 write fh 4,0/1931 8192 bytes @ 516096 <unstable> (frag 149:1480@0+) (ttl 64)
20:29:18.207529 aulait.998563913 > latte.nfs: 1472 write fh 4,0/1931 8192 bytes @ 516096 <unstable> (frag 152:1480@0+) (ttl 64)
**** Those writes are never acked.
What's curious is, on other machines, a FreeBSD machine
sends a write request for 8192 bytes but only provides
the first 1480 bytes, then sends a bunch of null requests. Is
this expected behavior?
Thanks in advance!
Brian