Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: mcelog?



On Mon, Apr 08, 2019 at 12:01:00AM -0700, John Nemeth wrote:
> On Apr 7,  9:48pm, "Aaron J. Grier" wrote:
> > On Wed, Mar 20, 2019 at 11:22:13AM -0700, John Nemeth wrote:

> > > (XEN) Bank 4: 945a4000fd080813 at        ef3581180
> > > (XEN) MCE: polling routine found correctable error.  Use mcelog to parse above e
> > > rror output.
> > [...]
> > >      In any event, if I'm reading the above correctly, I believe
> > > that it is telling that there is bad memory?
> > 
> > which CPU manufacturer and model is this?  memory is just one of
> > many possibilities which can generate machine check events.
> 
> cpu0: "AMD Opteron(tm) Processor 6386 SE              "
> cpu0: AMD Family 15h (686-class)
> cpu0: family 0x15 model 0x2 stepping 0 (id 0x600f20)

https://www.amd.com/system/files/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf

according to cursory register decode based on the above document, that
it does look like it could be an ECC-correctable memory error.  there's
another MSR that keeps a count of how many DRAM errors have been
detected -- too bad NetBSD doesn't have an MSR driver.  ;)

-- 
  Aaron J. Grier | "Not your ordinary poofy goof." | agrier%poofygoof.com@localhost
  "The price of reliability is the pursuit of the utmost simplicity.  It
   is a price which the very rich find most hard to pay."  -- Tony Hoare


Home | Main Index | Thread Index | Old Index