Subject: large FFSv2 -> file system corruption
To: None <current-users@netbsd.org>
From: Jan Schaumann <jschauma@netmeister.org>
List: current-users
Date: 07/19/2004 23:51:59
--Pk6IbRAofICFmK5e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi,

I'm experiencing some serious problems on one of my machines with a
large FFSv2 system.  The system in question uses a hardware RAID and has
partitions as so:

Filesystem         Size     Used     Avail Capacity  Mounted on
/dev/ld0a          9.8G     2.8G      6.6G    29%    /
/dev/ld0f          246G     226G      7.8G    96%    /stevens-cvs
/dev/ld0e           98G      40G       54G    42%    /mp3
/dev/ld0g          1.2T     202G      984G    17%    /anon-root

Partition /dev/ld0f contains well over 150 GB of data that actually
belong to /dev/ld0g, temporarily stored there.

The partition that creates the problems is /dev/ld0g.  It has shown such
severe filesystem corruption (duplicate inodes, fts_read failure, etc.)
that I eventually newfs'd it.  This partition contains a large ftp
mirror for various sites.  After having newfs'd it, and repopulating
the filesystem (by pax'ing data from /dev/ld0f, or by rsyncing from the
remote sites), I repeatedly experienced kernel panics (without getting a
dump - just hung on 'syncing disks') similar to what's reported in PR
kern/26254.

I thought it might possibly have to do with rsync, too, as the machine
went down more often while one of the rsync jobs was running (to sync
one of the ftp repositories) than at other times.

So I repeatedly newfs'd and repopulated, trying out various things.
Then I noticed that if I newfs the partition and then populate the
filesystem with a significant chunk of data (several tens of GB), and
then I force-fsck_ffs the filesystem, there are hundreds (!) of
corrections that need to be done (some dup's, some bad blocks, some
block count errors etc.).  The problems are corrected, and the
filesystem marked as clean.  I continue dumping data onto it.  I
re-fsck_ffs it (force) and again there are errors!

I'm not sure what it is, but it seems to be a filesystem problem that
only occurs after adding a significant amount of data.

FWIW, I have another system that has a 880 GB partition, to which it
dumps up to a few hundred GB regularly and it does not show this
behaviour.

Both are 2.0_BETA/i386 systems from sources around the beginning of the
month.

Any ideas?

-Jan

--=20
DON'T PANIC!

--Pk6IbRAofICFmK5e
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFA/JbffFtkr68iakwRAtm3AJ4/Qa4hcPbBxe3QET3BkYlF5qf7ywCfadB/
LJODzEEzewWBQCb4PVu7qvw=
=JMAb
-----END PGP SIGNATURE-----

--Pk6IbRAofICFmK5e--