Subject: Re: Data corruption issues possibly involving cgd(4)
To: Nino Dehne <ndehne@gmail.com>
From: Daniel Carosone <dan@geek.com.au>
List: current-users
Date: 01/17/2007 08:32:50
--vOmOzSkFvhd7u8Ms
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 16, 2007 at 08:49:14PM +0100, Nino Dehne wrote:
> > considerable numbers of seeks.  It is the seeks that cause the disks to
> > draw current bursts from the psu - so don't discount that.
>=20
> Good point. To accommodate to that, I repeatedly cat'ed the test file on =
the
> cgd partition to /dev/null. At the same time, I hashed the first 64M of r=
cgd0d
> in a loop. I used 64M instead of 256M because the disk thrashing was real=
ly
> bad. I also set the CPU frequency to its maximum to maximize the power the
> system draws.

a cpu-hog process would help here too..

> I attribute the checksum change to changes on the filesystem, since that =
was
> obviously mounted while doing the test.=20

Probably, yeah; I gave some suggestions for ways to avoid this a
moment ago, too.

> Getting over 70 equal checksums and then 3 equal other checksums in
> a row with flaky hardware seems highly improbable to me.

Or the 64m is fitting in cache most of the time, and the bad read was
cached and thus repeated?

> i.e. mismatch at the 3rd run. I seriously doubt that the 70+ successful r=
uns
> on the rcgd0d device were pure luck.

Please try some of the other variants I suggested.  Perhaps try
varying the block size of the dd, too.  If these eliminate seeking,
then the next possible culprit is probably the filesystem :-/.

--
Dan.
--vOmOzSkFvhd7u8Ms
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (NetBSD)

iD8DBQFFrUSCEAVxvV4N66cRAnhfAKCu29pgMrBgPsF20ELrNOdoUCkHLQCgrkdH
kauJslKvbYygD1hjG5U58K0=
=nhWo
-----END PGP SIGNATURE-----

--vOmOzSkFvhd7u8Ms--