NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Corrupting Files




Changed network buffer size from 4000 bytes to 16384 bytes, and it went from 80% corruption, to 10% corruption. (probably just that we decreased the number of writes..)

I also noticed only HDDs wd2 and wd3 had corruption, when wd0 and wd1 did not. (All 4 on the sil3114 controller).

We applied #37519 to our 4.0.1 kernel.
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=37519

and in 4 days have had no corruption at all. I don't know if it is fixed, but the frequency is so low enough to no longer matter.

Thanks for the support,

Lund


Manuel Bouyer wrote:
On Mon, Dec 22, 2008 at 08:02:20PM +0900, Jorgen Lundman wrote:
[...]

If it is a disk issue, 8-skip-4-then-4 byte change would mean 2 consecutive calls to write() and seek(). I thought then perhaps it is more likely to be in-memory cache corrupting, but why would they be flushed out so much later? Or can it be possible that the file "on disk" is good, but the read-cache, in-core, of the file is bad. (So, if I were to umount/mount, the file would be good again.. maybe I will try that too).

It is possible, yes. You can try to get the file out of the cache
(maybe waiting long enough will do, or run something that reads a lot
from disk to fill the buffer cache), and see if the file is still
corrupted after re-reading from disk.
You can also use dump/restore to read directly from disk.

But I've also seen hard disk silently corrupting files (you write something,
read it later and you don't get what you did write). I'm almost sure it
was the disk because remplacing the disk with another one from the same
batch solved the problem.


--
Jorgen Lundman       | <lundman%lundman.net@localhost>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)


Home | Main Index | Thread Index | Old Index