NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-alpha/53809: kernel locks up



The following reply was made to PR port-alpha/53809; it has been noted by GNATS.

From: Jason Thorpe <thorpej%me.com@localhost>
To: "gnats-bugs%netbsd.org@localhost" <gnats-bugs%NetBSD.org@localhost>
Cc: port-alpha-maintainer%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost,
 netbsd-bugs%netbsd.org@localhost,
 "martin%netbsd.org@localhost" <martin%NetBSD.org@localhost>
Subject: Re: port-alpha/53809: kernel locks up
Date: Tue, 1 Jan 2019 23:34:10 -0800

 > On Jan 1, 2019, at 10:15 PM, Martin Husemann <martin%duskware.de@localhost> =
 wrote:
 >=20
 > The following reply was made to PR port-alpha/53809; it has been noted =
 by GNATS.
 >=20
 > From: Martin Husemann <martin%duskware.de@localhost>
 > To: gnats-bugs%NetBSD.org@localhost
 > Cc:=20
 > Subject: Re: port-alpha/53809: kernel locks up
 > Date: Wed, 2 Jan 2019 07:10:33 +0100
 >=20
 > With a DEBUG kernel I get:
 >=20
 > [ 2334.1883937] panic: pmap_emulate_reference: !write but not FOR|FOE
 
 What this panic indicates is that pmap_emulate_reference() was called =
 with either ALPHA_MMCSR_FOR ("fault on read") or ALPHA_MMCSR_FOE ("fault =
 on execute"), but that the PTE for the faulting address does not have =
 the FOR or FOE bits set.  This is, of course, an inconsistency... but =
 looking more closely, I think that this particular DEBUG check is racy =
 on an MP system and thus probably tripping unnecessarily.  Consider:
 
 Process A (cpu0)							=
 Process B (cpu1)
 Exec libc page with printf (FOE)
 Performs FOE DEBUG check					Exec =
 libc page with printf (FOE)
 pmap_changebit()'s FOE to "off"				Performs FOE =
 DEBUG check
 										=
 BOOM
 
 If the pmap_changebit() call happens to clear the FOE bit in process B's =
 PTE before cpu1 performs the DEBUG check, then it will fire needlessly.
 
 Anyway, I think the DEBUG panic you're seeing is a red herring, and not =
 related to the real problem -- without that DEBUG check, process B on =
 cpu1 would simply do some redundant work under the correct locking =
 conditions.  It's only the DEBUG check that's wrong.  I'm not sure it's =
 possible to actually make the DEBUG check really MP-safe; once you've =
 taken the fault-on-whatever on cpu1, you're doomed if you do the check.  =
 That DEBUG block was last touched:
 
 1.22         (thorpej  26-Mar-98): #ifdef DEBUG                         =
 /* These checks are more expensive */
 1.22         (thorpej  26-Mar-98):      if (!pmap_pte_v(pte))
 1.22         (thorpej  26-Mar-98):              =
 panic("pmap_emulate_reference: invalid pte");
 1.203        (chs      24-Aug-03):      if (type =3D=3D ALPHA_MMCSR_FOW) =
 {
 1.22         (thorpej  26-Mar-98):              if (!(*pte & (user ? =
 PG_UWE : PG_UWE | PG_KWE)))
 1.22         (thorpej  26-Mar-98):                      =
 panic("pmap_emulate_reference: write but unwritable");
 1.22         (thorpej  26-Mar-98):              if (!(*pte & PG_FOW))
 1.22         (thorpej  26-Mar-98):                      =
 panic("pmap_emulate_reference: write but not FOW");
 1.22         (thorpej  26-Mar-98):      } else {
 1.22         (thorpej  26-Mar-98):              if (!(*pte & (user ? =
 PG_URE : PG_URE | PG_KRE)))
 1.22         (thorpej  26-Mar-98):                      =
 panic("pmap_emulate_reference: !write but unreadable");
 1.22         (thorpej  26-Mar-98):              if (!(*pte & (PG_FOR | =
 PG_FOE)))
 1.22         (thorpej  26-Mar-98):                      =
 panic("pmap_emulate_reference: !write but not FOR|FOE");
 1.22         (thorpej  26-Mar-98):      }
 1.22         (thorpej  26-Mar-98):      /* Other diagnostics? */
 1.22         (thorpej  26-Mar-98): #endif
 
 ----------------------------
 revision 1.22
 date: 1998-03-26 02:18:03 +0000;  author: thorpej;  state: Exp;  lines: =
 +2784 -2
 684;
 Remove the Mach 3 pmap from the tree, replacing it with the contents of
 pmap.old.<whatever>.  To see the history, look at the corresponding
 pmap.old.<whatever> file.
 ----------------------------
 
 (Chuq's change in rev 1.203 doesn't affect the logic of the DEBUG =
 check...)
 
 ...which definitely predates adding multiprocessor support to the Alpha =
 pmap, so I'm not surprised that it's buggy and no one noticed before now =
 because how many people run DEBUG kernels really?
 
 Unfortunately, I don't think this helps narrow down the real problem =
 you're seeing :-(
 
 -- thorpej
 


Home | Main Index | Thread Index | Old Index