Subject: Re: ASUS P55TP4 motherboard experiences?
To: John Nemeth <jnemeth@cue.bc.ca>
From: John F. Woods <jfw@jfwhome.funhouse.com>
List: port-i386
Date: 12/02/1995 22:20:21
> } "Parity is for farmers."
> } 	- Seymour Cray, on why the Cray 1 supercomputer did *not* have parity.

>     Oops.  That leaves the question of what the follow-ons, and other
> brands of supercomputers do.  Also, Seymour Cray may have been a
> genius, but that doesn't mean he was perfect.  I would be interested
> in his justification for the above quote (just because somebody famous
> says something doesn't mean you have to, or should accept it blindly).

I think his justification for ignoring the parity option was the belief that
if you cared about your data, you wanted ECC; and his justification for not
going with ECC was speed.  In fact, supercomputer users, especially
bleeding-edge supercomputer users, will perform all kinds of unnatural acts
for an extra 5% throughput, and even parity costs you a bit in speed.  (Pun
unintentional.)  But obviously Cray, and his customers, regarded the existing
memory technology as reliable enough that it was worth bothering with day-long
calculations in its absence.  (And most Cray-1 customers were more than
sophisticated enough to make that judgement call on an informed basis.)

> } With memory reliability as good as it is, parity is probably of dubious
> } utility; it is no longer to be *expected* that memory will fail during the
> } life of a computer.  Any application with data valuable enough to need
>     This may be so, but as somebody whose hats include "computer
> hardware technician", I know full well that memory can and will fail.
> So will cache, which is real fun to diagnose.  Anything that reduces
> the reliability and integrity of a system is extremely bad in my
> opinion.  All of my personal systems have parity, and I will never use
> one of the new-fangled motherboards that don't support parity.

Does the *bus* have parity?  Does the CPU have internal parity?  Is the
parity to your disk channel end-to-end, or does it get regenerated at the
disk interface?  Given that the answers are probably "no", "no", and "the
latter", just how shabby a system are you running there, anyway?  ;-)

(Of course, the last supercomputer *I* worked on had ECC and, yes, parity
checked inter-cpu busses...  Of course, I also saw an Ethernet card on one
fail in a REALLY bizarre way that caused it to curdle data so routinely
that the UDP checksum was occasionally failing[*], resulting in curdled NFS
files.  Whee!  It managed to do that despite parity checking between the
LANCE and the on-card memory, and between the card and CPU over the I/O bus.)

[*]  It's one of those head-slapping moments here.  I have spent the past
two years idly pondering why the UDP checksum seemed to fail on that card
far more frequently than one false-negative packet per 65535 bad packets,
especially considering that FTP transfers to that same ailing host worked
perfectly reliably, if slowly.  But DUH if the UDP checksum *field* is 0,
no check is performed, and I'll bet that's what the bad board was spewing
out instead of good bytes...