Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Flakey AMD box



I am trying to get to the bottom of why one of my i386 boxen is very
unreliable. It is my only 32 bit AMD CPU machine.

Symptoms are:
On boot, sometimes all is well, sometimes I get a panic - usually in uiomove
(mode=READ), then reboot, panic, reboot, all well.

This is nasty: files which are built correctly into /usr/obj are then
corrupt when installed. Several times fate rolled a double 6 and
/usr/obj/lib/libc.i386/libc.so.12.179 was fine, but
/lib/libc.so.12.179 was a file with the same name, length, timestamp, but
containing "data" rather than a shared object. (Last round it happened
to both libc and libgcc_s which was fun.)
It isn't just those files, though it is immediately obvious when it
happens to them. (Happened again with yesterday's -current)

I thought it must be flakey hardware, but I have now replaced the
disks with a known working pair of IDE raidframe mirrored disks
plugged into hptide.  They (disks and controller card) come out of
a NetBSD/i386 server that has been rock solid for years.

I also changed the Sempron 2200+ and 512MB DDR400 memory, to a
?Athlon XP 3000+? (from memory) and 3x512MB DDR333 memory.  Two
iterations of memtest86 4.0a were happy. The motherboard is an ASUS
A7V600-X, so VIA chipset (eg VT8377).

Same flakiness - all I didn't change are the motherboard and power supply,
and a sata drive is still connected but not mounted anywhere special.

During builds, with that much memory, /usr/obj/* must be in the
cache, so on install, presumably the corrupt content gets written
to disk from the cache? So the contents of the cache changed between
a write from cache to /usr/obj disk, and a write from cache to /
disk? Look in uvm? For what?
(Awful when there isn't a reproducible test... race condition?)

Cheers,

Patrick


Home | Main Index | Thread Index | Old Index