Subject: NFS/filesystem corruption
To: None <current-users@netbsd.org>
From: Frank van der Linden <vdlinden@fwi.uva.nl>
List: current-users
Date: 11/28/1994 13:15:51
Maybe someone who is using NetBSD as a fileserver has seen this too, I hope
so, because I don't have a clue what's going on here:

There's a 486/33 64Mb ram, with a soon-to-be-replaced 1.2Gb ESDI disk here.
It's acting as a fileserver for 5 386/33 machines, 4 with Linux, one
with NetBSD (NetBSD is version 1.0). The exports file on the fileserver
(columbus) looks like this:

/usr -maproot=0 struis
/usr2 -alldirs -maproot=0 zatte crystal sponge warp struis
/var/mail -maproot=0 struis zatte

struis is the NetBSD client, the others are Linux. One of the Linux
clients' mounts the filesystems like this:

columbus:/usr2/linux/usr
                      569683  266839   274359     49%   /usr
columbus:/usr2/home   569683  266839   274359     49%   /home
columbus:/var/mail     47663    9042    36237     20%   /var/spool/mail

..and the NetBSD client like this:
columbus:/usr            569068   136432   404182    25%    /usr
columbus:/usr2/home      569683   266839   274359    49%    /home
columbus:/var/mail        47663     9042    36237    20%    /var/mail


Now, here's the problem: after a while, corruption appears in the /usr2
filesystem. Mainly in /usr2/home. Homedirectories disappear, fsck reports
lots of bad filetypes, etc. A sample fsck -n output (part of it):

** Phase 2 - Check Pathnames
UNALLOCATED  I=131341  OWNER=root MODE=0
SIZE=0 MTIME=Jan  1 01:00 1970 
NAME=/home/robbel

REMOVE? no

UNALLOCATED  I=131339  OWNER=root MODE=0
SIZE=3370632596655439886 MTIME=Jan  1 01:00 1970 
NAME=/home/halderen/linux/fs/isofs/symlink.c

REMOVE? no

UNALLOCATED  I=131340  OWNER=root MODE=0
SIZE=0 MTIME=Jan  1 01:00 1970 
NAME=/home/halderen/linux/fs/isofs/util.c

REMOVE? no

(etc)

The corruption seems to appear in groups of inodes; the above example is
part of a corruption of inodes 131332-13150. The corruption always appears
in /usr2, and mostly in /usr2/home. The fileserver has been up for 8 days,
and it's filesystems were ok when it started up. But /usr2 seems to be
falling apart more each day.

There were some problems with the ESDI drive (bad sectors), but there have
been no bad sector reports lately. The only thing that makes /usr2 different
from the other filesystems, is that it is mounted by 2 different systems.
/usr (only mounted by NetBSD) has never shown any problems. Is it an NFS
bug that only shows up when Linux clients use a NetBSD fileserver? Could
buggy client code cause this (this should be impossible)? It almost looks
to me as if a range of inodes has been overwritten in some way.

Ok, I'll stop whining here :) Any hints would be appreciated.

Frank