Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: mcelog?
On Mon, Apr 08, 2019 at 12:01:00AM -0700, John Nemeth wrote:
> On Apr 7, 9:48pm, "Aaron J. Grier" wrote:
> > On Wed, Mar 20, 2019 at 11:22:13AM -0700, John Nemeth wrote:
> > > (XEN) Bank 4: 945a4000fd080813 at ef3581180
> > > (XEN) MCE: polling routine found correctable error. Use mcelog to parse above e
> > > rror output.
> > [...]
> > > In any event, if I'm reading the above correctly, I believe
> > > that it is telling that there is bad memory?
> >
> > which CPU manufacturer and model is this? memory is just one of
> > many possibilities which can generate machine check events.
>
> cpu0: "AMD Opteron(tm) Processor 6386 SE "
> cpu0: AMD Family 15h (686-class)
> cpu0: family 0x15 model 0x2 stepping 0 (id 0x600f20)
https://www.amd.com/system/files/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf
according to cursory register decode based on the above document, that
it does look like it could be an ECC-correctable memory error. there's
another MSR that keeps a count of how many DRAM errors have been
detected -- too bad NetBSD doesn't have an MSR driver. ;)
--
Aaron J. Grier | "Not your ordinary poofy goof." | agrier%poofygoof.com@localhost
"The price of reliability is the pursuit of the utmost simplicity. It
is a price which the very rich find most hard to pay." -- Tony Hoare
Home |
Main Index |
Thread Index |
Old Index