Subject: Re: Data corruption issues possibly involving cgd(4)
To: Daniel Carosone <firstname.lastname@example.org>
From: Nino Dehne <email@example.com>
Date: 01/16/2007 22:44:14
On Wed, Jan 17, 2007 at 08:32:50AM +1100, Daniel Carosone wrote:
> On Tue, Jan 16, 2007 at 08:49:14PM +0100, Nino Dehne wrote:
> > > considerable numbers of seeks. It is the seeks that cause the disks to
> > > draw current bursts from the psu - so don't discount that.
> > Good point. To accommodate to that, I repeatedly cat'ed the test file on the
> > cgd partition to /dev/null. At the same time, I hashed the first 64M of rcgd0d
> > in a loop. I used 64M instead of 256M because the disk thrashing was really
> > bad. I also set the CPU frequency to its maximum to maximize the power the
> > system draws.
> a cpu-hog process would help here too..
While doing the above, the CPU is about 0%-8% idle. I'm still running a
> > I attribute the checksum change to changes on the filesystem, since that was
> > obviously mounted while doing the test.
> Probably, yeah; I gave some suggestions for ways to avoid this a
> moment ago, too.
I'll have a look. Your other mail just arrived due to connectivity problems
> > Getting over 70 equal checksums and then 3 equal other checksums in
> > a row with flaky hardware seems highly improbable to me.
> Or the 64m is fitting in cache most of the time, and the bad read was
> cached and thus repeated?
Just doing the hashing from rcgd0d leaves the disks active 100%. I think
dd from a raw device is not cached.
> > i.e. mismatch at the 3rd run. I seriously doubt that the 70+ successful runs
> > on the rcgd0d device were pure luck.
> Please try some of the other variants I suggested. Perhaps try
> varying the block size of the dd, too. If these eliminate seeking,
> then the next possible culprit is probably the filesystem :-/.
Gonna do this right away.
Thanks and regards,