Subject: kern/13436: NFS File corruption Problem (same as kern/13361?)
To: None <gnats-bugs@gnats.netbsd.org>
From: Duncan McEwan <duncan@mcs.vuw.ac.nz>
List: netbsd-bugs
Date: 07/11/2001 23:50:32
>Number:         13436
>Category:       kern
>Synopsis:       NFS File corruption Problem (same as kern/13361?)
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 11 04:48:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Duncan McEwan
>Release:        NetBSD-current 1.5S - 1.5W
>Organization:
Victoria University of Wellington
	
>Environment:
System: NetBSD rialto.mcs.vuw.ac.nz 1.5U NetBSD 1.5U (MCS_WORKSTATION) #0: Wed Apr 11 15:36:35 NZST 2001 mark@turakirae.mcs.vuw.ac.nz:/src/work/src/sys/arch/i386/compile/MCS_WORKSTATION i386
Architecture: i386
Machine: i386
>Description:

For the last few months we've been observing a file corruption problem that we
think is associated with NFS.  This has been on 1.5S through to the 1.5W
systems that we are currently running.
 
The problem is that when files on an NFS server are updated in some particular
(unknown) way some portions of the file are corrupted by blocks of null
characters.

This sounded quite like a problem reported by Nathan Williams in PR kern/13361
so we tried the workaround he suggested there (lowering the write size) but
that didn't help our situation.

We also saw Chuck Silvers post a message to current-users on 4th July with the
subject line "Re: kern/13353: can not build libc when running a -current
kernel" in which he said he'd applied a fix to ufs_inode.c which fixed the
recent libc building problem but which he thought might also fix Nathan's
problem.  But a kernel build from updated sources didn't help with our problem
and Nathan reported that it didn't fix his either.

This problem also sounds very similar to one reported by Scott Presnel in a
posting to current-users and port-i386 on 16/6/2001 with a subject line "1.5
current NFS client problem?".  From a quick check of the archives I don't think
a solution to this was ever posted.

I realise that we haven't provided a whole lot of information to help track
this down.  But we *do* have a relatively straight-forward way of triggering
the problem (it actually involves writing a file from a Macintosh via netatalk
to an NFS mounted filesystem - the result is that the resource fork of the file
gets corrupted with null bytes and the mac application fails to reread the
file).  So if anyone has any idea's of what could be causing this problem we
are more than willing to try out kernel patches and/or add debugging to provide
further information...

>How-To-Repeat:
Complicated to explain, but we're happy to run tests if required.

>Fix:
Not known.
>Release-Note:
>Audit-Trail:
>Unformatted: