NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Corrupting Files




Hello list,

kernel: NetBSD 4.0.1 i386
userland: 3.0.1
Role: FTP server
driver: cgd on Silicon Image SATALink 3114

We are currently experiencing problems with file corruption. We have observed the file being uploaded, and tested 100% valid. (Also made file.good copy). Then at some point later, the file is corrupted.

I'm just trying to get a feel for where the problem may be lying. What is strange is that the "mtime" of the file is not updated. Is it possible to write to files from userland without updating "mtime"?

For example:

-r--r--r--  1 10001  10001  15000000 Dec 22 08:25 file.bin
-r--------  1 root   wheel  15000000 Dec 22 08:26 file.bin.newgood


As you can see, I made "file.bin.newgood" backup about 1 minute after "file.bin" was completed. I wrote a script to compare the two files every minute.

About 17 minutes later:

+0019C0F0 4C CE 8F 72 1B 18 CC 3E 52 A2 2D 7A A0 CA AF 41 |L..r....R..z...A| -0019C0F0 4C CE 8F 72 1B 18 CC 3E DE F6 C0 E0 17 9F 70 6A |L..r..........pj|

+0019C100 BB 5F 6B F3 D3 B3 4D E0 73 38 05 39 E8 C6 F7 BB |..k...M.s8.9....| -0019C100 BB 5F 6B F3 D2 03 11 CE 73 38 05 39 E8 C6 F7 BB |..k.....s8.9....|

+001A80F0 F2 7B 78 12 35 7E 26 EE F0 34 0D 0B 6F 7B 4A 4D |..x.5....4..o.JM| -001A80F0 F2 7B 78 12 35 7E 26 EE F6 F9 CB 6F BF CF 7E 0A |..x.5......o....|

+001A8100 35 73 82 7F DD 11 56 98 FE 95 DE 71 05 34 81 B7 |5s....V....q.4..| -001A8100 35 73 82 7F 3A DD 76 D0 FE 95 DE 71 05 34 81 B7 |5s....v....q.4..|


And yet:

-r--r--r--  1 10001  10001  15000000 Dec 22 08:25 file.bin
-r--------  1 root   wheel  15000000 Dec 22 08:26 file.bin.newgood


If it is corrupting on disk (cache?) after it has been written, can I rule out userland problems? Should I look at memory stick, cgd implementation, perhaps even kernel vs userland mismatch?

What is also strange is the differences are nearly always (at least on the files I have checked) at offset xxxx0F0-xxx0100. 8 bytes changed, 4 unchanged, 4 bytes changed.

The server (userland) has performed without issues for 8 years, but hardware was replaced about 2 months ago. I'm currently trying to determine if it happens only on certain disks/partitions, and/or attempt to ktrace it. (but on a FTP server ktrace is far too verbose). It is very random, I can not predict when/where it will happen.



Home | Main Index | Thread Index | Old Index