Port-hppa archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Flaky pthread locks on HP 9000/360



I wrote:
> (2) there's something, perhaps platform-specific, wrong in the "RAS"
> mechanism that is used to provide atomic_cas when the hardware has
> no such instruction.

Bingo.  Observe the two calls to ras_lookup in
sys/arch/hppa/hppa/hppa_machdep.c:

cpu_getmcontext():
        ras_pc = (__greg_t)ras_lookup(l->l_proc,
            (void *)(gr[_REG_PCOQH] & ~HPPA_PC_PRIV_MASK));

hppa_ras():
	rasaddr = (intptr_t)ras_lookup(p, (void *)tf->tf_iioq_head);

One of these is not like the other.

What is going on here is that HPPA encodes the processor privilege
level in the low-order two bits of saved PC values (relying on the
actual address of any instruction to be word-aligned).  For an
interrupt out of userspace, those bits will be 11.  cpu_getmcontext
is masking those off before calling ras_lookup, hppa_ras is not.
The impact of this can be seen by noting that the range test in
ras_lookup is

	if (addr > rp->ras_startaddr && addr < rp->ras_endaddr) {

The clear intention of this ras_lookup code is that a PC value that
is exactly equal to ras_startaddr is not a hit; this makes sense
because if we're at the start of the RAS sequence there is surely
no need to re-start it.  Thus, while cpu_getmcontext will correctly
perceive that there's no need to hack the PC value when we interrupted
at the RAS start, hppa_ras will falsely detect a match (because 11 > 00)
and decide to reset the PC value.

But if it's just resetting to the same value, no harm is done, right?
Almost.  The full coding in hppa_ras is:

        rasaddr = (intptr_t)ras_lookup(p, (void *)tf->tf_iioq_head);
        if (rasaddr != -1) {
                rasaddr |= HPPA_PC_PRIV_USER;
                tf->tf_iioq_head = rasaddr;
                tf->tf_iioq_tail = rasaddr + 4;
        }

The "|=" line is reinstalling the userspace privilege-level bits, so
what we put back into tf->tf_iioq_head is indeed the same value that
was there.  But what of tf_iioq_tail?  Maybe it's not always PC+4?
I threw some printf's into hppa_machdep.c, and soon enough I had
a smoking gun:

[   848.727582] RAS2: substitute 0xadd3e018 for 0xadd3e01b, 0xadd3df4f <---!!
[  1082.179723] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1082.203798] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1082.222168] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1082.240096] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1082.259618] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1107.877984] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1109.016943] RAS2: substitute 0xadd3e008 for 0xadd3e00f, 0xadd3e013
[  1109.034806] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1109.054393] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1109.072701] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1109.090514] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1109.110094] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1109.128332] RAS2: substitute 0xadd3e008 for 0xadd3e00b, 0xadd3e00f
[  1113.558201] RAS2: substitute 0xadd3e018 for 0xadd3e01b, 0xadd3df4f <---!!
[  1113.582706] RAS2: substitute 0xadd3e018 for 0xadd3e01b, 0xadd3e01f
[  1113.602262] RAS2: substitute 0xadd3e018 for 0xadd3e01b, 0xadd3e01f

(This is printing the ras_lookup result, tf_iioq_head, and tf_iioq_tail;
see attached patch.)  The instance at uptime 848 didn't seem to have
any obvious effect, but I think the one at 1113 is the explanation
for the tar crash I saw shortly thereafter.  (I was running a loop
of "tar tvzf" invocations during this segment of the dmesg log.)
In the coredump from that crash, gdb showed 0xadd3e018 as being the
address of _atomic_cas_16_up.

Of course, one crash doesn't prove much, but what convinces me that this
is a correct diagnosis is that fixing hppa_ras to mask off those bits
seems to have stopped the libpthread crashes completely.  I've been
beating on the machine pretty hard for the better part of a day and
not seen any more of them, whereas unpatched I was having a hard time
avoiding them.

I'm not enough of an HPPA hardware maven to be sure of why tf_iioq_tail
is sometimes different from PC+4, but I note that my C360 is of the
"HPPA 2.0" architecture, which HP touted for having out-of-order
execution capabilities.  I theorize that the machine is sometimes
capable of taking an interrupt with some prior instruction incompletely
executed, which it represents in the saved state in this way.  If we
overwrite tf_iioq_tail with rasaddr + 4, that prior instruction never
happens at all when we return control to the program, and badness ensues.

Of course this raises the question of whether the rasaddr + 4 recipe
is good enough ever, but it does seem to work for interrupts taken
*within* the RAS sequences that NetBSD uses.  I've monitored the
results with these printfs and not seen any counterexamples as long
as we take care to exclude the initial load-from-memory instruction.

Digging in the CVS history, I see that these two calls have looked like
this since HPPA RAS support was first added, in 2004.  So if this bug is
old enough to vote, why wasn't it identified long since?  One plausible
theory is that only some minority of HPPA 2.0 systems have the ability
to get into this state, and until fairly recently NetBSD didn't work
well enough on such hardware that anyone would get to the point of
noticing occasional userland instability.  Another plausible theory is
that the case did not happen to manifest until some recent code change
in libpthread and/or compiler code generation for it.

In any case, I'm pretty sure that changing hppa_ras' call as below
is a correct change to make.  (The printf additions are just to
document my debugging process, and are not meant for commit.)

I'm not too sure about protocol for this sort of thing; should I file
a send-pr report too?

			regards, tom lane

Index: sys/arch/hppa/hppa/hppa_machdep.c
===================================================================
RCS file: /cvsroot/src/sys/arch/hppa/hppa/hppa_machdep.c,v
retrieving revision 1.32
diff -u -r1.32 hppa_machdep.c
--- sys/arch/hppa/hppa/hppa_machdep.c	22 Aug 2021 20:18:39 -0000	1.32
+++ sys/arch/hppa/hppa/hppa_machdep.c	11 May 2022 03:40:38 -0000
@@ -113,6 +113,8 @@
 	ras_pc = (__greg_t)ras_lookup(l->l_proc,
 	    (void *)(gr[_REG_PCOQH] & ~HPPA_PC_PRIV_MASK));
 	if (ras_pc != -1) {
+	  printf("RAS: substitute 0x%08lx for 0x%08lx, 0x%08lx\n",
+		 ras_pc, gr[_REG_PCOQH], gr[_REG_PCOQT]);
 		ras_pc |= HPPA_PC_PRIV_USER;
 		gr[_REG_PCOQH] = ras_pc;
 		gr[_REG_PCOQT] = ras_pc + 4;
@@ -279,8 +281,11 @@
 
 	p = l->l_proc;
 	tf = l->l_md.md_regs;
-	rasaddr = (intptr_t)ras_lookup(p, (void *)tf->tf_iioq_head);
+	rasaddr = (intptr_t)ras_lookup(p,
+	    (void *)(tf->tf_iioq_head & ~HPPA_PC_PRIV_MASK));
 	if (rasaddr != -1) {
+	  printf("RAS2: substitute 0x%08lx for 0x%08x, 0x%08x\n",
+		 rasaddr, tf->tf_iioq_head, tf->tf_iioq_tail);
 		rasaddr |= HPPA_PC_PRIV_USER;
 		tf->tf_iioq_head = rasaddr;
 		tf->tf_iioq_tail = rasaddr + 4;


Home | Main Index | Thread Index | Old Index