tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Scheduling problem - need some help here




Hi Folks

A reminder that the issue in this thread (and recorded in this PR: http://gnats.netbsd.org/55415 ) is still very much outstanding and will result in the vax port being nearly unusable in NetBSD 10+. There is a 'fix' mentioned in the PR, but it's not clear it's the right solution.

Any/all thoughts on this are welcome.

Thanks.

Later...

Greg Oster

On 2020-07-28 06:01, Anders Magnusson wrote:
Hi,

Den 2020-07-28 kl. 13:28, skrev Nick Hudson:
On 28/06/2020 16:11, Anders Magnusson wrote:
Hi,

there is a problem (on vax) that I do not really understand. Greg Oster
filed a PR on it (#55415).

A while ago ad@ removed the  "(ci)->ci_want_resched = 1;" from
cpu_need_resched() in vax/include/cpu.h.
And as I read the code (in kern_runq.c) it shouldn't be needed,
ci_want_resched should be set already when the macro cpu_need_resched()
is invoked.

But; without setting cpu_need_resched=1 the vax performs really bad (as
described in the PR).

cpu_need_resched may have multiple values nowadays, setting it to 1 will
effectively clear out other flags, which is probably what makes it work.

Anyone know what os going on here (and can explain it to me)?

I'm no expert here, but I think the expectation is that each platform
has its own method to signal "ast pending" and eventually call userret
(and preempt) when it's set - see setsoftast/aston.
VAX has hardware ASTs, (AST is actually a VAX operation), which works so that if an AST is requested, then next time an REI to userspace is executed it will get an AST trap instead and then reschedule.

As I don't understand vax I don't know what

    197 #define cpu_signotify(l)     mtpr(AST_OK,PR_ASTLVL)

is expected to do, but somehow it should result in userret() being called.
Yep, this is the way an AST is posted. Next time an REI is executed it will trap to the AST subroutine.

Other points are:

- vax cpu_need_resched doesn't seem to differentiate between locally
  running lwp and an lwp running on another cpu.
Most likely.  It was 20 years since I wrote the MP code (and probably the same since anyone tested it last time) and at that time LWPs didn't exist in NetBSD.  I would be surprised if it still worked :-)

- I can't see how hardclock would result in userret being called, but
  like I said - I don't know vax.
When it returns from hardclock (via REI) it directly traps to the AST handler instead if an AST is posted.
http://src.illumos.org/source/xref/netbsd-src/sys/arch/vax/vax/intvec.S#311

I believe ci_want_resched is an MI variable for the scheduler which is
why its use in vax cpu_need_resched got removed.
It shouldn't be needed, but obviously something breaks if it isn't added.

What I think may have happened is that someone may have optimized something in the MI code that expects a different behaviour than the VAX hardware ASTs have.  AFAIK VAX is (almost) the only port that have hardware ASTs.

Thanks for at least looking at this.

-- Ragge


Home | Main Index | Thread Index | Old Index