Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Strange system behavior



On Tue, 21 Sep 2010, Brian Buhrow wrote:

        Hello.  I wonder if you can get a file of md5 checksums on all the
files in your build tree and then write a script to repeatedly check those
checksums.  This might give you a clue as to the nature of the problem.
For example, if you copy the build tree to another machine, get the
checksums, and then run the checksum test against the copy on the faulty
machine.  Running that for a day or two might yield interesting results.

Interesting you should suggest this...

Several months ago I had upgraded the machine's previous incarnation to a more recent -current. After the upgrade, I did a dump of the root file system, with output to a file on another partition, and gzipped the result. I then took an MD5 checksum of the .dump.gz file, and tried to copy across my local network. I made multiple copies (one at a time!) of this 10GB file to a remote NFS partition, and every time I copied, the _copy_ had a different MD5 - not only different from the original, but different from all previous copy attempts. I then tried using ftp (both "push" and "pull") for the copy, and it also failed with apparently random MD5! A few days later, I was able to make the copy successfully, without having changed anything!

As I say, this is the _previous_ incarnation of the same machine. The entire machine has been pretty much rebuilt from the bottom up, with only the case, fans, power supply, and LAN cable and switch port surviving from the original box! Everything else has been replaced.

Since none of the current problems has anything to do with the network (the builds are all being performed to/from local disk), I might start to suspect the power supply. But I am at a loss to understand how a faulty power supply would hiccup at exactly the same place in several successive builds that are run hours apart!

Needless to say, this box is sick, so I'm going to have to start the rebuild over again. :(


-------------------------------------------------------------------------
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |                          | pgoyette at netbsd.org  |
-------------------------------------------------------------------------


Home | Main Index | Thread Index | Old Index