tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

what to do on memory or cache errors?



besides panicing, of course.

This is going to involve a lot of help from UVM.  

It seems that uvm_fault is not the right place to handle this.  Maybe we need a

void uvm_page_error(paddr_t pa, int etype);

where etype would indicate if this was a memory or cache fault, was the cache 
line dirty, etc.  If uvm_page_error can't "correct" the error, it would panic.

Interactions with copyin/copyout will also need to be addressed.

Preemptively, we could have a thread force dirty cache lines to memory if 
they've been in L2 "too long" (thereby reducing the problem to an ECC error on 
a clean cache line which means you just toss the cache-line contents.)  We can 
also have a thread that reads all of memory (slowly) thereby causing any single 
bit errors to be corrected before they become double-bit errors.

I'm not familiar enough with UVM internals to actually know what to do but I 
hope someone else reading this is.

Comments anyone?


Home | Main Index | Thread Index | Old Index