NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-alpha/53809: kernel locks up




> On Jan 1, 2019, at 10:15 PM, Martin Husemann <martin%duskware.de@localhost> wrote:
> 
> The following reply was made to PR port-alpha/53809; it has been noted by GNATS.
> 
> From: Martin Husemann <martin%duskware.de@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Cc: 
> Subject: Re: port-alpha/53809: kernel locks up
> Date: Wed, 2 Jan 2019 07:10:33 +0100
> 
> With a DEBUG kernel I get:
> 
> [ 2334.1883937] panic: pmap_emulate_reference: !write but not FOR|FOE

What this panic indicates is that pmap_emulate_reference() was called with either ALPHA_MMCSR_FOR ("fault on read") or ALPHA_MMCSR_FOE ("fault on execute"), but that the PTE for the faulting address does not have the FOR or FOE bits set.  This is, of course, an inconsistency... but looking more closely, I think that this particular DEBUG check is racy on an MP system and thus probably tripping unnecessarily.  Consider:

Process A (cpu0)							Process B (cpu1)
Exec libc page with printf (FOE)
Performs FOE DEBUG check					Exec libc page with printf (FOE)
pmap_changebit()'s FOE to "off"				Performs FOE DEBUG check
										BOOM

If the pmap_changebit() call happens to clear the FOE bit in process B's PTE before cpu1 performs the DEBUG check, then it will fire needlessly.

Anyway, I think the DEBUG panic you're seeing is a red herring, and not related to the real problem -- without that DEBUG check, process B on cpu1 would simply do some redundant work under the correct locking conditions.  It's only the DEBUG check that's wrong.  I'm not sure it's possible to actually make the DEBUG check really MP-safe; once you've taken the fault-on-whatever on cpu1, you're doomed if you do the check.  That DEBUG block was last touched:

1.22         (thorpej  26-Mar-98): #ifdef DEBUG                         /* These checks are more expensive */
1.22         (thorpej  26-Mar-98):      if (!pmap_pte_v(pte))
1.22         (thorpej  26-Mar-98):              panic("pmap_emulate_reference: invalid pte");
1.203        (chs      24-Aug-03):      if (type == ALPHA_MMCSR_FOW) {
1.22         (thorpej  26-Mar-98):              if (!(*pte & (user ? PG_UWE : PG_UWE | PG_KWE)))
1.22         (thorpej  26-Mar-98):                      panic("pmap_emulate_reference: write but unwritable");
1.22         (thorpej  26-Mar-98):              if (!(*pte & PG_FOW))
1.22         (thorpej  26-Mar-98):                      panic("pmap_emulate_reference: write but not FOW");
1.22         (thorpej  26-Mar-98):      } else {
1.22         (thorpej  26-Mar-98):              if (!(*pte & (user ? PG_URE : PG_URE | PG_KRE)))
1.22         (thorpej  26-Mar-98):                      panic("pmap_emulate_reference: !write but unreadable");
1.22         (thorpej  26-Mar-98):              if (!(*pte & (PG_FOR | PG_FOE)))
1.22         (thorpej  26-Mar-98):                      panic("pmap_emulate_reference: !write but not FOR|FOE");
1.22         (thorpej  26-Mar-98):      }
1.22         (thorpej  26-Mar-98):      /* Other diagnostics? */
1.22         (thorpej  26-Mar-98): #endif

----------------------------
revision 1.22
date: 1998-03-26 02:18:03 +0000;  author: thorpej;  state: Exp;  lines: +2784 -2
684;
Remove the Mach 3 pmap from the tree, replacing it with the contents of
pmap.old.<whatever>.  To see the history, look at the corresponding
pmap.old.<whatever> file.
----------------------------

(Chuq's change in rev 1.203 doesn't affect the logic of the DEBUG check...)

...which definitely predates adding multiprocessor support to the Alpha pmap, so I'm not surprised that it's buggy and no one noticed before now because how many people run DEBUG kernels really?

Unfortunately, I don't think this helps narrow down the real problem you're seeing :-(

-- thorpej



Home | Main Index | Thread Index | Old Index