Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: my simh-vax crashed: panic: pmap_enter on PG_SREF page



Den 2023-12-21 kl. 03:50, skrev Kalvis Duckmanton:
On 21/12/23 13:05, Greg Oster wrote:
On 2023-12-20 19.42, Johnny Billquist wrote:
Nothing concrete, except that that amount of memory did have issues in the past. We hope it had been fixed, but maybe there is some kind of issue with that much memory still? Could you try with less physical memory? Like 128M?

I've seen this error at least 3 times now too.... this last time was with 256MB, but I'm pretty sure the first two were with just 64MB RAM.

I've seen this also - from my notes it appeared to be due to stale entries in the translation buffer for the pages in system space holding the page table entries for P0 or P1 space.  Flushing the translation buffer when extending P0 or P1 space seems to have fixed it.  I'm attaching a patch for consideration.
Hm, interesting. Let us try to follow this path for p0 expansion :-)

- grow_p0() is called from pmap_enter(), which is also where the panic() is later on. - pmap_enter() will call TBIA at the end, so if your idea is correct then the problem is that the virtual pte mapping in kernel space is stale and points to some other mapping. - This will in that case happen when reading oldpte from *pteptr. pteptr is the new space allocated after grow.
- To get this new space in which pteptr points extent_alloc() is called.
- If it fails then rmproc() is called to find to try to find some process that is OK for swapping (similar logic) and removes its mappings to free up some space in the user page table map. - When space is found, the system page table pte's that maps the user page tables is copied (from the old place) and cleared (the new mappings).

So:
- if there is a process that has been removed from its process space and that process space has been reused, and
- its PTEs had entries in the TLB, and
- its user process pages has been referenced (SREF set), then
- the stale page entry may be read and cause this fault.

Any comments on the flow of events that I just wrote down above? Does it seems correct?
if so, it seems that you might have found yet another bug, Kalvis :-)

Also, since SREF is set on the page, it means that a user process page is modified, and that information will now be lost. Maybe this should be propagated upwards before removing the user process space?

-- Ragge


Home | Main Index | Thread Index | Old Index