NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
port-hppa/56837: RAS support is slightly incorrect within hppa kernel, too
>Number: 56837
>Category: port-hppa
>Synopsis: RAS support is slightly incorrect within hppa kernel, too
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-hppa-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat May 14 20:40:00 +0000 2022
>Originator: Tom Lane
>Release: HEAD/202205021430Z (problem is old, though)
>Organization:
PostgreSQL Global Development Group
>Environment:
NetBSD sss2.sss.pgh.pa.us 9.99.96 NetBSD 9.99.96 (GENERIC) #3: Sat May 14 14:36:00 EDT 2022 tgl%nuc1.sss.pgh.pa.us@localhost:/home/tgl/netbsd-H-202205021430Z/obj.hppa/sys/arch/hppa/compile/GENERIC hppa
>Description:
This is a follow-on to PR port-hppa/56830: I've discovered that there's a similar hazard internal to the kernel. sys/arch/hppa/hppa/lock_stubs.S contains a sequence that sys/arch/hppa/hppa/intr.c intends to make atomic by restarting, but it's being careless about the edge case in kind of the same way as hppa_ras was:
if (frame->tf_iisq_head == HPPA_SID_KERNEL &&
frame->tf_iioq_head >= (u_int)_lock_cas_ras_start &&
frame->tf_iioq_head <= (u_int)_lock_cas_ras_end) {
frame->tf_iioq_head = (u_int)_lock_cas_ras_start;
frame->tf_iioq_tail = (u_int)_lock_cas_ras_start + 4;
Here, we know that the low-order PC bits should be zero, so there's no need for masking; but it's still the case that this will opt to reset tf_iioq_head and tf_iioq_tail if tf_iioq_head is exactly equal to _lock_cas_ras_start. As before, in that case this does nothing to tf_iioq_head but it seems to risk changing tf_iioq_tail inappropriately.
>How-To-Repeat:
I have not actually been able to produce a problem here. I tried adding this inside the if block:
frame->tf_iioq_head <= (u_int)_lock_cas_ras_end) {
+ if (frame->tf_iioq_tail != frame->tf_iioq_head + 4)
+ printf("lock_cas needs RAS to %p but we have 0x%08x, 0x%08x\n",
+ _lock_cas_ras_start,
+ frame->tf_iioq_head, frame->tf_iioq_tail);
frame->tf_iioq_head = (u_int)_lock_cas_ras_start;
and what I got was very infrequent bleats like
[ 9347.196452] lock_cas needs RAS to 0x224068 but we have 0x00224070, 0x0022407c
AFAICS that case is valid, BTW: it corresponds to what you'd expect if the interrupt were taken immediately after _lock_cas's "comb,<>" instruction had decided to branch. tf_iioq_head is pointing at the instruction in the branch's delay slot, which should be executed but hasn't been yet, and tf_iioq_tail is pointing at the branch destination which should be the next thing to execute. So what this trace proves is only that the RAS reset does need to happen sometimes, but there's no evidence that we ever reach the initial "ldw" with tf_iioq_tail different from PC+4.
However ... I sure wouldn't want to bet that that couldn't happen, either today with some other test load than I've been using, or in future after some kernel code rearrangement or other. We know it *does* happen in userspace and there seems no good reason to assume it can't ever happen in the kernel.
>Fix:
A minimal fix is just to turn the >= test into >:
Index: intr.c
===================================================================
RCS file: /cvsroot/src/sys/arch/hppa/hppa/intr.c,v
retrieving revision 1.6
diff -u -r1.6 intr.c
--- intr.c 26 Feb 2022 03:02:25 -0000 1.6
+++ intr.c 14 May 2022 18:28:39 -0000
@@ -339,7 +339,7 @@
extern char _lock_cas_ras_end[];
if (frame->tf_iisq_head == HPPA_SID_KERNEL &&
- frame->tf_iioq_head >= (u_int)_lock_cas_ras_start &&
+ frame->tf_iioq_head > (u_int)_lock_cas_ras_start &&
frame->tf_iioq_head <= (u_int)_lock_cas_ras_end) {
frame->tf_iioq_head = (u_int)_lock_cas_ras_start;
frame->tf_iioq_tail = (u_int)_lock_cas_ras_start + 4;
Personally, though, I'd also change the "<=" to "<" and move the _lock_cas_ras_end label to compensate:
Index: intr.c
===================================================================
RCS file: /cvsroot/src/sys/arch/hppa/hppa/intr.c,v
retrieving revision 1.6
diff -u -r1.6 intr.c
--- intr.c 26 Feb 2022 03:02:25 -0000 1.6
+++ intr.c 14 May 2022 18:30:45 -0000
@@ -339,8 +339,8 @@
extern char _lock_cas_ras_end[];
if (frame->tf_iisq_head == HPPA_SID_KERNEL &&
- frame->tf_iioq_head >= (u_int)_lock_cas_ras_start &&
- frame->tf_iioq_head <= (u_int)_lock_cas_ras_end) {
+ frame->tf_iioq_head > (u_int)_lock_cas_ras_start &&
+ frame->tf_iioq_head < (u_int)_lock_cas_ras_end) {
frame->tf_iioq_head = (u_int)_lock_cas_ras_start;
frame->tf_iioq_tail = (u_int)_lock_cas_ras_start + 4;
}
Index: lock_stubs.S
===================================================================
RCS file: /cvsroot/src/sys/arch/hppa/hppa/lock_stubs.S,v
retrieving revision 1.28
diff -u -r1.28 lock_stubs.S
--- lock_stubs.S 20 Mar 2022 20:19:34 -0000 1.28
+++ lock_stubs.S 14 May 2022 18:30:45 -0000
@@ -171,8 +171,8 @@
ldw 0(%arg0),%t1
comb,<> %arg1, %t1, 1f
copy %t1,%ret0
-_lock_cas_ras_end:
stw %arg2,0(%arg0)
+_lock_cas_ras_end:
copy %arg1,%ret0
1:
bv,n %r0(%rp)
The point of this is to make these range checks less confusingly different from ras_lookup().
I've been running with this second patch for a little while and not seen any problems.
Home |
Main Index |
Thread Index |
Old Index