Subject: Re: Data corruption issues possibly involving cgd(4)
To: Steven M. Bellovin <smb@cs.columbia.edu>
From: Daniel Carosone <dan@geek.com.au>
List: current-users
Date: 01/18/2007 07:31:47
--3D7yMlnunRPwJqC7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Wed, Jan 17, 2007 at 10:02:11AM -0500, Steven M. Bellovin wrote:
> Is there any chance the two different mirrors -- you did say RAID,
> right, though I confess I don't remember which variant -- have
> different versions of the block?  That shouldn't happen, of course, but
> if it did it would explain the problem.

It's RAID5, from the original post.  One of the first ideas I had and
eliminated, alas.

Nino, are you running a kernel with DIAGNOSTIC and/or DEBUG?  Looking
at the cgd panic you found, I'm guessing not, because the path we see
to that problem would have involved one or more likely DIAGNOSTIC
messages.

If you're able, adding those options would probably be a very good
idea at this point, especially as filesystem issues are looking more
and more likely.  The combination of ffsv2, >1Tb, and older kernels
smells fishy to me, and any additional clues they may provide could be
vital.  Reproducing that combination on a test machine, without cgd
and R5, would also be a good idea if feasible.

--
Dan.

--3D7yMlnunRPwJqC7
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (NetBSD)

iD8DBQFFroeyEAVxvV4N66cRAkPxAJ9y0AD17ZMx7q8zM3x44w/b4/Yy9ACgvUUG
3zFKYXtcjaER2FMHLKMqxNE=
=eSSg
-----END PGP SIGNATURE-----

--3D7yMlnunRPwJqC7--