The following reply was made to PR port-vax/55415; it has been noted by GNATS.
From: Anders Magnusson <ragge%tethuvudet.se@localhost>
To: gnats-bugs%netbsd.org@localhost, oster%netbsd.org@localhost
Cc:
Subject: Re: port-vax/55415: vax no longer preempts in a timely fashion
Date: Thu, 30 Jul 2020 21:07:37 +0200
> I've done a bit more debugging... What I'm seeing is that in
> kern_runq.c:sched_resched_cpu() the call to cpu_need_resched(ci, l, f)
> happens, cpu_need_resched() sets up the AST. Except it's only once in a
> while that the trap with the AST fires, userret() gets called, and
> preemption happens! Sometimes the trap with AST fires once, and not
> again... sometimes it fires 5 times in a row, and then misses.... but I
> don't know why an AST that has been posted would subsequently get missed
> sometimes....
>
> So it's able to hit a situation where cpu_need_resched() is called, but
> the corresponding AST never fires. The loop in sched_resched_cpu() that
> sets ci->ci_want_resched keeps thinking (correctly!) that the AST has
> already been setup, and so doesn't try to call cpu_need_resched() again.
> When it gets 'stuck' like this, we never see an AST until the process
> completes. (nor do we see preemption until the process completes.)
> That seems to be because if I check the AST status with:
>
> if (mfpr(PR_ASTLVL) != AST_OK)
>
> that condition is always true... (meaning the AST is not setup...)
>
> Any ideas on how an AST can just 'disappear'? (I'm using the same
> mfpr() check right after the mtpr() setting of PR_ASTLVL, and there it
> thinks it's set just fine... so how does it go missing a few moments
> after????)
>
The AST is only acked if it has been taken. This is done in trap(),
just before userret() is called.
Losing the AST should not be possible.
Reading the VAX manual says that ASTLVL is not saved by svpctx, so if a
process switch occurs before the AST is delivered it will be lost.
Can this ever happen?
Since ASTs are intended to cause the process
switch, can a switch be called from a higher level of interrupt these days?
You could add in your code something like:
s = splhigh();
mtpr(AST_OK, PR_ASTLVL);
if (mfpr(PR_ASTLVL) != AST_OK)
printf("ERROR\n");
splx(s);
and see if you still get a missing AST?
-- Ragge