NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: MD5 failing on optical media



Thor might be on to something regarding cp's use of read() vs mmap().

I'll summarize my long post which I only sent to the mailing list:

I had similar problems on a USB hard disk, which had 90,000 files on it.  A
few would have sha256 checksum differences (from what was expected on a
backup copy).  I wrote the sha256 program, and discovered that the read()
function returned 0 (EOF) BEFORE the EOF was reached!  So the checksum was
wrong.

Some affected files were small (1 MB), some were larger ISO files (400 MB).
Maybe some 4 GB files failed, can't remember.  The files affected were
random and different every time.

Something I forgot to mention was that this usb disk has the 4kb sector
sizes: http://mail-index.netbsd.org/netbsd-users/2012/03/03/msg010180.html

I also had inserted the sha256 code into the cp.c code to compute the
checksum at same time as copying.  I think I recall that cp uses read() or
mmap() depending on the size of the file to be copied.  I can't remember if
this version computed correct checksums 100% or not.

Anyhow, something is wrong.  And I suppose still possible that the disk
drive has a flaw too.

I also can't remember (not near the systems right now) if the machine
showing these symptoms was NetBSD or FreeBSD.  I assume that the system code
at this low level might be the same.  This may not be a valid assumption on
my part, but the symptoms seemed SO very similar to the original poster's.

John Refling



_____________________________________________
From: John Refling [mailto:netbsdrat%gmail.com@localhost] 
Sent: Wednesday, August 21, 2013 5:18 AM
To: netbsd-users%NetBSD.org@localhost
Subject: Re: MD5 failing on optical media






I have had similar problems, although in some respects quite different,
regarding USB hard disk vs. SATA/PATA hard disk.

Let me try to explain from memory:

I have a mix of FreeBSD and NetBSD systems.  So I might be using a NetBSD
formatted disk on FreeBSD AND vice versa.  This might be the issue, but read
on.  I can track down and replicate the details in a few days, and retest
things as needed.

Anyway, I wrote an md5 and sha256 program from the spec by cutting and
pasting, quite simple really, just an init() stage, an update() stage
(repeated with each INPUTBUFFSIZE block of your data), and a finalize()
stage, and print out the result.

I have 3 identical archives of approx 90,000 files, none really huge.  One
on USB hard disk, one on SATA hard disk and one on PATA hard disk, different
machine (some NetBSD, some FreeBSD).  These disks are all copies of each
other, either over network, or locally.

When I run my md5/sha256 on all the files on the SATA and PATA, everything
matches perfectly.

However, on the USB disk, there seem to be about 4-5 files that fail to
match the ckecksum (both md5/sha256 as will be seen later) of the same files
(by filename, and expected same copies) on the PATA/SATA disks.

Everytime I run the md5/sha256 checksum on the USB hard disk, a different
4-5-6 files (out of 90,000) fail to compare (different checksum).

More bizarre, when I do a 'cmp' from USB disk to PATA/SATA the files compare
OK.  Also, when I copy them from the USB to local tmp area on PATA/SATA, the
checksum of the local file is now correct (matches PATA/SATA), similar to
the OP.

Since I wrote (copied) the md5/sha256 program, I was curious as to what was
happening.  I tossed in some debugging info:

What I discovered was that my update() loop on the few files that failed
checksum would always quit on exactly a multiple of the BUFFERSIZE in the
read() call.

So if my read() INPUTBUFFSIZE was 1024 k bytes, and the file was 4 x 1024 k
bytes + SOME_SMALL_NUMBER (say 100), there would be

Read #1 gets 1024 k bytes, update(), chksum OK
Read #2 gets 1024 k bytes, update(), chksum OK
Read #3 gets 1024 k bytes, update(), chksum OK
Read #4 gets 1024 k bytes, update(), chksum OK
Read #5 returns 0, no more bytes, even though 100 (say) bytes left in file
to compute correct checksum..... THEREFORE CHECKSUM WRONG !

I also added a bit of code to warn if the stat.st_size of file does not
equal to added up cumulative # bytes read(), give a warning.  Same result,
the few files that lost bytes on the read() also (obviously) did not compare
# bytes in the stat() for the file.  Stat() was correct, but read() read
LESS than the stat() file length and returned a 0 (no more data early).
UNLESS WE NEED TO WAIT FOR A TIMEOUT OR DELAYED MORE DATA AVAILABLE.

So on these 4-5 or 5-6 files (DIFFERENT FILES *EVERY* TIME IT IS RUN) a tiny
# of files loose a small # bytes in the read() loop (which obviously cause
checksum to be entirely different). [Read() loop gets a 0 and terminates
early, with a few bytes left that SHOULD BE READ() but are not].

This is only observed on the USB hard disk, not SATA/PATA.  I only have one
USB hard disk under test, so this statement is not really definitive as to
cause.

I initially assumed this was a hardware defect in the USB hard disk (might
still be), or maybe in a driver, or ???...

FYI,

John Refling

Maybe the common thing with the OP and me is 'USB' (CD/HD)

<<attachment: winmail.dat>>



Home | Main Index | Thread Index | Old Index