Port-mips archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Anyone working on Octeon SMP?
On Mon, Apr 6, 2026 at 6:42 AM Nick Hudson <nick.hudson%gmx.co.uk@localhost> wrote:
>
> On 06/04/2026 10:49, Kevin Bowling wrote:
> > On Sat, Mar 28, 2026 at 8:28 AM Andrew Parker <andrew%pmk1.net@localhost> wrote:
> >>
> >> On Thursday, 19 March 2026 23:31:36 EDT Kevin Bowling wrote:
> >>> On Sun, Feb 2, 2025 at 6:52 AM Andrew Parker <andrew%pmk1.net@localhost> wrote:
> >>>> On 1/29/25 06:23, Nick Hudson wrote:
> >>>>> On 20/12/2024 22:01, Andrew Parker wrote:
> >>>>>> Hi, I've been working towards getting a MULTIPROCESSOR build more
> >>>>>> stable
> >>>>>> on my ER4 and would like to know if anyone else may be working on the
> >>>>>> same?
> >>>>>>
> >>>>>> I've found a few areas that were causing instability on my machine and
> >>>>>> have some patches that help (but mostly just for debugging at this
> >>>>>> point). If there's any interest in teaming up and exchanging ideas or
> >>>>>> patches for SMP on Octeon please let me know.
> >>>>>
> >>>>> Sure.
> >>>>>
> >>>>> I've dropped the ball on this and said I had a couple of fixes in mind,
> >>>>> but done nothing to share them. Hopefully we can make it stable.
> >>>>>
> >>>>> Nick
> >>>>
> >>>> Great! I'm curious about what you have in mind for improvement. I've
> >>>> mostly been looking around TLB invalidation and perhaps a missing memory
> >>>> barrier.
> >>>>
> >>>> Anyway, I'll work on getting some stuff cleaned up and contact directly
> >>>> if that works for you.
> >>>
> >>> I'm interested in this as well, is there anything to share around
> >>> current status or issues as well as if anything is pending out of
> >>> tree?
> >>
> >> I hoped to spend more time on this over the winter but ended up moving and
> >> some of my networking gear is still packed up.
> >>
> >> It's been a slow process getting my test environment setup again but it would
> >> be great to pick this back up...especially if there's continued interest in
> >> it.
> >>
> >> Give me a week or two to see what I can dig up and I'll be in touch.
> > I've made some progress but it turned into a much deeper hole than I
> > was anticipating. I can increase SMP stability with some changes in
> > pmap_tlb.c to add some icache syncs
>
> I have a change I'll commit soon to handle EXECness (more) correctly.
>
> Am I right in thinking at least some octeon processors icaches are VIVT?
> I've forgotten most of the mips stuff I knew... If so there will be more flushing
> required for VIVT.
VIPT. But it has an assortment of fun issues.
> > and guarding around an assert that
> > is easy to trigger
> > @@ -735,7 +767,9 @@ pmap_tlb_shootdown_bystanders(pmap_t pm)
> > * And best of all, we avoid an IPI.
> > */
> > KASSERT(!kernel_p);
> > - pmap_tlb_pai_reset(ti, pai, pm);
> > + if (pai->pai_asid > KERNEL_PID) {
> > + pmap_tlb_pai_reset(ti, pai, pm);
> > + }
>
>
> you mean this KASSERT in pmap_tlb_pai_reset?
Yeah. But let me think harder about this.
> 252 /*
> 253 * We must have an ASID but it must not be onproc (on a processor).
> 254 */
> 255 KASSERT(pai->pai_asid > KERNEL_PID);
>
> > I think there are a variety of pmap changes needed. But I wonder if
> > any MIPS SMP or other PMAP_TLB_NEED_SHOOTDOWN has been heavily
> > exercised?
>
> Almost certainly not.
Good to know, I was a bit shy about looking at the MI pmap at first.
> > IPI and TLB stuff is probably "simpler" on ARM. On
> > FreeBSD (13), OpenBSD, Linux the MIPS TLB shootdowns are synchronous.
>
> No other NetBSD architecture (not sure of powerpc booke status actually) uses
> the PMAP_TLB_NEED_SHOOTDOWN stuff.
>
> I'm about to switch aarch64 to sys/uvm/pmap which uses architecture defined
> broadcast TLB operations. RISC-V uses SBI remote fence operations.
Arch defined broadcast sounds very suitable for what we'll need to do
here. Is that out of tree?
> > My test is iperf3 -P4 --bidir against the ER4 (4 cores). With my
> > current changes I can get it to run about 900 seconds, where before it
> > often would die in a few or a dozen seconds. I'm having trouble
> > catching the ultimate source of the instability, whether it is TLB or
> > something else.
>
> I always use kernel build as my test fwiw.
The iperf bidir is really good at falling this thing over. But I will
try some other workloads once this is solid.
I learned about 'wdogctl -p 3 -k wdog0', it saves me from having to go
unplug it most of the time when the CPUs get stuck.
> Nick
Home |
Main Index |
Thread Index |
Old Index