Subject: Re: Data corruption issues possibly involving cgd(4)
To: Nino Dehne <ndehne@gmail.com>
From: Daniel Carosone <dan@geek.com.au>
List: current-users
Date: 01/16/2007 17:24:12
--GTZ+2qEBTXdGs1w1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 16, 2007 at 06:58:09AM +0100, Nino Dehne wrote:
> I am currently experiencing data corruption using 4.0_BETA2 from around
> mid-december.

I'm sorry to hear it.  You certainly seem to have taken the obvious
steps to eliminate other possible sources of the problem.

I can offer you some reassurrance that I don't see the same problem,
and haven't ever in a long history of using cgd.  I have just tried to
specifically reproduce your test on a plain cgd-on-wd, without hitting
a different hash value.

I'm not sure where your problem lies, but it's not a simple one.

> ACPI is enabled. This is an Athlon 64 X2 3600+ EE on an ASRock ALiveSATA2=
-GLAN
> and an MDT 512M stick of DDR2-800 RAM. The box has 5 drives in a RAID5 us=
ing
> raid(4) and a cgd(4) on top of that.

=2E..

> 3) same as 1) but the file resides on a non-cgd partition on a RAID1 using
>    raid(4): the problem also does _not_ occur. I aborted the hashing after
>    100 runs where the problem would have shown up with certainty in 1).

any chance you could test with a RAID5 - ideally from the same RAID5 -
without cgd?  It could be a controller or drive problem, or even a
power supply problem when all drives are active.  RAID1 won't
necessarily hit those conditions, especially for read.

You could probably achive the same result dd'ing a constant chunk of
encrypted data off the raid(4) device to checksum, avoiding the need
to destroy or remake filesystems.  If you reproduce the problem like
this, you have also eliminated filesystem bugs. =20

For comparison, Sun's ZFS has shown up these kinds of problems (power,
controller concurrency) in marginal hardware on a number of occasions.

> Please help, I'm at a loss.

It's a tricky one, but the above would be my next guess, and the next
useful thing to try to eliminate.

--
Dan.

--GTZ+2qEBTXdGs1w1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (NetBSD)

iD8DBQFFrG+LEAVxvV4N66cRAkDSAJ9KMkjgVfCMnlNMK1JZelgAMjxDngCcC4Qv
pFFyVuWxljJU02S+75DgM9Y=
=8me+
-----END PGP SIGNATURE-----

--GTZ+2qEBTXdGs1w1--