Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD in SIMH vax with 512 MB RAM



On Fri 17 Mar 2023 at 10:54:46 +1100, Kalvis Duckmanton wrote:
> On 15/3/23 18:01, Jan-Benedict Glaw wrote:
> > On Thu, 2023-03-09 12:39:19 +1100, Kalvis Duckmanton <kalvisd%gmail.com@localhost> wrote:
> > > I have no explanation for why this might be a problem only for large memory
> > > configurations, though.
> > I'm so keen to see if this works. But without understanding what's
> > actually happening, that is probably not ready to be merged upstream?

I searched for other occurrences of CASMAGIC and found this one in
arch/vax/vax/trap.c:

	case T_PTELEN:
#ifndef MULTIPROCESSOR
		/*
		 * If we referred to an address beyond the end of the system
		 * page table, it may be due to a failed CAS
		 * restartable-atomic-sequence.  If it is, restart it at the
		 * beginning and restart.
		 */
		{
			extern const uint8_t cas32_ras_start[], cas32_ras_end[];
			if (tf->tf_code == CASMAGIC
			    && tf->tf_pc >= (uintptr_t) cas32_ras_start
			    && tf->tf_pc < (uintptr_t) cas32_ras_end) {
				tf->tf_pc = (uintptr_t) cas32_ras_start;
				trapsig = false;
				break;
			}
		}
		/* FALLTHROUGH */
#endif

The idea of Restartable Atomic Sequences is that you do NOT need to lock
out interrupts (so in that regard this patch is totally the wrong thing
to do), but instead you detect when an interrupt hits inside the
sequence. If so, you restart it from the beginning.

So let's remind outself of the sequence from vax/lock_stubs.S:

#else
/*
 * entry:
 *	r1 = address to be CAS'ed
 *	r2 = old value
 *	r3 = new value
 *	r4 = global cell to hold CAS address (common to all callers)
 *	     e.g. address of curcpu()->ci_cas_addr
 * exit:
 *	r0 = old value
 */
	.globl	cas32_ras_start, cas32_ras_end
cas32_ras_start:
	movl	%r1, (%r4)
	movl	*(%r4), %r0
	cmpl	%r2, %r0
	bneq	1f
	movl	%r3, *(%r4)
cas32_ras_end:
1:
	movl	$CASMAGIC, (%r4)
	rsb
#endif /* !MULTIPROCESSOR */

This address in r4 can perfectly be used for this: if an interrupt moves
an invalid address into the memory location, the sequence will trap
because of it, at any of the 2 places where *(%r4) is used (because of
the double indirection). The code in trap.c will detect this case and
move the PC back to cas32_ras_start.

There are several weak points in this story:

1. there is no obvious code anywhere (except here!) that stores
   $CASMAGIC into (%r4). So an interrupt doesn't automatically cause the
   RAS to trap and restart.

   The counter to that might be that somebody assumed that any interrupt
   that is "significant enough to bother about", will use this cas32
   function too, and hence poison the pointer with CASMAGIC at the end.
   Personally, I would want to be a bit more sure that each interrupt
   *really* poisons the pointer.

2. I think if an interrupt hits just when the first instruction of the
   sequence is about to be executed, and poisons the pointer, it won't
   cause the sequence to trap because the poisoned pointer gets
   overwritten. 

   As a counter-argument one could say that perhaps that place isn't
   considered "in" the sequence yet, even though it would satisfy
   "tf->tf_pc >= (uintptr_t) cas32_ras_start" from trap.c.

3. If an interrupt hits after "movl    *(%r4), %r0" and then next cmpl
   compares "not equal", the second access to *(%r4) gets skipped and no
   trap/restart occurs.

   Counter: luckily, for CAS, a "not equal" return code (actual old
   value differs from expected old value) represents a failure and
   should cause a re-try anyway. 

4. Perhaps this is the reason why the whole thing doesn't work if you
   have a lot of RAM: if we take the comment "beyond the end of the
   system page table" at face value, then maybe such a large memory
   configuration causes the page tables to grow enough that the T_PTELEN
   trap does not occur. Instead there would be some other trap type.

> kalvis
-Olaf.
-- 
___ "Buying carbon credits is a bit like a serial killer paying someone else to
\X/  have kids to make his activity cost neutral." -The BOFH    falu.nl@rhialto

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index