[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: what to do on memory or cache errors?
ECC is enabled enly if
- memory controller supprts ECC,
- memory module supports ECC, and
- both are configured to use ECC,
right? Then interrupt handler should invoke the interrupt handler
registered by the relevant memory module's device driver. Memory
controllers & modules are auto-conf'ed ideally. (Think memory hotplug
Similarly CPU drivers should register cache error handlers too, if needed.
I'm not sure what to be done in these handlers. Maybe we can learn
from high-end/mission-critical industrial/commercial systems?
I think notification from physical memory/address has some need in
other cases like migration. Good to see more interest in this area.
On Tue, Aug 23, 2011 at 5:58 AM, Matt Thomas <matt%3am-software.com@localhost>
> besides panicing, of course.
> This is going to involve a lot of help from UVM.
> It seems that uvm_fault is not the right place to handle this. Maybe we need
> void uvm_page_error(paddr_t pa, int etype);
> where etype would indicate if this was a memory or cache fault, was the cache
> line dirty, etc. If uvm_page_error can't "correct" the error, it would panic.
> Interactions with copyin/copyout will also need to be addressed.
> Preemptively, we could have a thread force dirty cache lines to memory if
> they've been in L2 "too long" (thereby reducing the problem to an ECC error
> on a clean cache line which means you just toss the cache-line contents.) We
> can also have a thread that reads all of memory (slowly) thereby causing any
> single bit errors to be corrected before they become double-bit errors.
> I'm not familiar enough with UVM internals to actually know what to do but I
> hope someone else reading this is.
> Comments anyone?
Main Index |
Thread Index |