NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-alpha/53809: kernel locks up
The following reply was made to PR port-alpha/53809; it has been noted by GNATS.
From: Jason Thorpe <thorpej%me.com@localhost>
To: "gnats-bugs%netbsd.org@localhost" <gnats-bugs%NetBSD.org@localhost>
Cc: port-alpha-maintainer%netbsd.org@localhost,
gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost,
"martin%netbsd.org@localhost" <martin%NetBSD.org@localhost>
Subject: Re: port-alpha/53809: kernel locks up
Date: Tue, 1 Jan 2019 23:34:10 -0800
> On Jan 1, 2019, at 10:15 PM, Martin Husemann <martin%duskware.de@localhost> =
wrote:
>=20
> The following reply was made to PR port-alpha/53809; it has been noted =
by GNATS.
>=20
> From: Martin Husemann <martin%duskware.de@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Cc:=20
> Subject: Re: port-alpha/53809: kernel locks up
> Date: Wed, 2 Jan 2019 07:10:33 +0100
>=20
> With a DEBUG kernel I get:
>=20
> [ 2334.1883937] panic: pmap_emulate_reference: !write but not FOR|FOE
What this panic indicates is that pmap_emulate_reference() was called =
with either ALPHA_MMCSR_FOR ("fault on read") or ALPHA_MMCSR_FOE ("fault =
on execute"), but that the PTE for the faulting address does not have =
the FOR or FOE bits set. This is, of course, an inconsistency... but =
looking more closely, I think that this particular DEBUG check is racy =
on an MP system and thus probably tripping unnecessarily. Consider:
Process A (cpu0) =
Process B (cpu1)
Exec libc page with printf (FOE)
Performs FOE DEBUG check Exec =
libc page with printf (FOE)
pmap_changebit()'s FOE to "off" Performs FOE =
DEBUG check
=
BOOM
If the pmap_changebit() call happens to clear the FOE bit in process B's =
PTE before cpu1 performs the DEBUG check, then it will fire needlessly.
Anyway, I think the DEBUG panic you're seeing is a red herring, and not =
related to the real problem -- without that DEBUG check, process B on =
cpu1 would simply do some redundant work under the correct locking =
conditions. It's only the DEBUG check that's wrong. I'm not sure it's =
possible to actually make the DEBUG check really MP-safe; once you've =
taken the fault-on-whatever on cpu1, you're doomed if you do the check. =
That DEBUG block was last touched:
1.22 (thorpej 26-Mar-98): #ifdef DEBUG =
/* These checks are more expensive */
1.22 (thorpej 26-Mar-98): if (!pmap_pte_v(pte))
1.22 (thorpej 26-Mar-98): =
panic("pmap_emulate_reference: invalid pte");
1.203 (chs 24-Aug-03): if (type =3D=3D ALPHA_MMCSR_FOW) =
{
1.22 (thorpej 26-Mar-98): if (!(*pte & (user ? =
PG_UWE : PG_UWE | PG_KWE)))
1.22 (thorpej 26-Mar-98): =
panic("pmap_emulate_reference: write but unwritable");
1.22 (thorpej 26-Mar-98): if (!(*pte & PG_FOW))
1.22 (thorpej 26-Mar-98): =
panic("pmap_emulate_reference: write but not FOW");
1.22 (thorpej 26-Mar-98): } else {
1.22 (thorpej 26-Mar-98): if (!(*pte & (user ? =
PG_URE : PG_URE | PG_KRE)))
1.22 (thorpej 26-Mar-98): =
panic("pmap_emulate_reference: !write but unreadable");
1.22 (thorpej 26-Mar-98): if (!(*pte & (PG_FOR | =
PG_FOE)))
1.22 (thorpej 26-Mar-98): =
panic("pmap_emulate_reference: !write but not FOR|FOE");
1.22 (thorpej 26-Mar-98): }
1.22 (thorpej 26-Mar-98): /* Other diagnostics? */
1.22 (thorpej 26-Mar-98): #endif
----------------------------
revision 1.22
date: 1998-03-26 02:18:03 +0000; author: thorpej; state: Exp; lines: =
+2784 -2
684;
Remove the Mach 3 pmap from the tree, replacing it with the contents of
pmap.old.<whatever>. To see the history, look at the corresponding
pmap.old.<whatever> file.
----------------------------
(Chuq's change in rev 1.203 doesn't affect the logic of the DEBUG =
check...)
...which definitely predates adding multiprocessor support to the Alpha =
pmap, so I'm not surprised that it's buggy and no one noticed before now =
because how many people run DEBUG kernels really?
Unfortunately, I don't think this helps narrow down the real problem =
you're seeing :-(
-- thorpej
Home |
Main Index |
Thread Index |
Old Index