Port-mips archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Anyone working on Octeon SMP?
On Sunday, 19 April 2026 11:22:00 EDT Nick Hudson wrote:
> On 16/04/2026 13:24, Andrew Parker wrote:
> > On Monday, 13 April 2026 12:58:34 EDT Kevin Bowling wrote:
> >> On Fri, Apr 10, 2026 at 11:45 PM Nick Hudson <nick.hudson%gmx.co.uk@localhost>
wrote:
> >>> On 10/04/2026 11:18, Kevin Bowling wrote:
> >>>> On Mon, Apr 6, 2026 at 6:42 AM Nick Hudson <nick.hudson%gmx.co.uk@localhost>
> >
> > wrote:
> >>>>> On 06/04/2026 10:49, Kevin Bowling wrote:
> >>>>>> On Sat, Mar 28, 2026 at 8:28 AM Andrew Parker <andrew%pmk1.net@localhost>
wrote:
> >>>>>>> On Thursday, 19 March 2026 23:31:36 EDT Kevin Bowling wrote:
> >>>>>>>> On Sun, Feb 2, 2025 at 6:52 AM Andrew Parker <andrew%pmk1.net@localhost>
> >
> > wrote:
> >>>>>>>>> On 1/29/25 06:23, Nick Hudson wrote:
> >>>>>>>>>> On 20/12/2024 22:01, Andrew Parker wrote:
> >>>>>>>>>>> Hi, I've been working towards getting a MULTIPROCESSOR build
> >>>>>>>>>>> more
> >>>>>>>>>>> stable
> >>>>>>>>>>> on my ER4 and would like to know if anyone else may be working
> >>>>>>>>>>> on
> >>>>>>>>>>> the
> >>>>>>>>>>> same?
> >>>>>>>>>>>
> >>>>>>>>>>> I've found a few areas that were causing instability on my
> >>>>>>>>>>> machine and
> >>>>>>>>>>> have some patches that help (but mostly just for debugging at
> >>>>>>>>>>> this
> >>>>>>>>>>> point). If there's any interest in teaming up and exchanging
> >>>>>>>>>>> ideas or
> >>>>>>>>>>> patches for SMP on Octeon please let me know.
> >>>>>>>>>>
> >>>>>>>>>> Sure.
> >>>>>>>>>>
> >>>>>>>>>> I've dropped the ball on this and said I had a couple of fixes in
> >>>>>>>>>> mind,
> >>>>>>>>>> but done nothing to share them. Hopefully we can make it stable.
> >>>>>>>>>>
> >>>>>>>>>> Nick
> >>>>>>>>>
> >>>>>>>>> Great! I'm curious about what you have in mind for improvement.
> >>>>>>>>> I've
> >>>>>>>>> mostly been looking around TLB invalidation and perhaps a missing
> >>>>>>>>> memory
> >>>>>>>>> barrier.
> >>>>>>>>>
> >>>>>>>>> Anyway, I'll work on getting some stuff cleaned up and contact
> >>>>>>>>> directly
> >>>>>>>>> if that works for you.
> >>>>>>>>
> >>>>>>>> I'm interested in this as well, is there anything to share around
> >>>>>>>> current status or issues as well as if anything is pending out of
> >>>>>>>> tree?
> >>>>>>>
> >>>>>>> I hoped to spend more time on this over the winter but ended up
> >>>>>>> moving and
> >>>>>>> some of my networking gear is still packed up.
> >>>>>>>
> >>>>>>> It's been a slow process getting my test environment setup again but
> >>>>>>> it would be great to pick this back up...especially if there's
> >>>>>>> continued interest in it.
> >>>>>>>
> >>>>>>> Give me a week or two to see what I can dig up and I'll be in touch.
> >>>>>>
> >>>>>> I've made some progress but it turned into a much deeper hole than I
> >>>>>> was anticipating. I can increase SMP stability with some changes in
> >>>>>> pmap_tlb.c to add some icache syncs
> >>>>>
> >>>>> I have a change I'll commit soon to handle EXECness (more) correctly.
> >>>
> >>> This is committed
> >>>
> >>> https://mail-index.netbsd.org/source-changes/2026/04/10/msg161522.html
> >>>
> >>> m
> >>>
> >>>>> Am I right in thinking at least some octeon processors icaches are
> >>>>> VIVT?
> >>>>> I've forgotten most of the mips stuff I knew... If so there will be
> >>>>> more flushing required for VIVT.
> >>>>
> >>>> VIPT. But it has an assortment of fun issues.
> >>>
> >>> fun issues?
> >>
> >> It seems to require pretty deliberate management that is different
> >> than other MIPS and archs.
> >>
> >> I was able to get SMP stabilized to my own satisfaction last night, in
> >> so far as it can hold up cnmac driver modifications which is what I
> >> was originally started with.
> >>
> >> There are a few categories of mandatory fixes: membar_release is
> >> missing a SYNC_PLUNGER (second syncw), INT_MASKs critically missing in
> >> octeon_intr.c, and a variety of locore fixups. Beyond that I ended up
> >> redoing octeon_intr.c to distribute interrupts and fix my previous
> >> octeon III patch. I have changes to the PMAP_TLB_NEED_SHOOTDOWN path
> >> primarily that I am least confident about. I'll try and organize
> >> everything into a more deliberate patch series, right now it is pretty
> >> messy with attempts and debugging.
> >
> > I was finally able to boot up my ER-4 last night. I don't have anything
> > to add (yet) to the pmap discussion here except I'm guessing I'm in
> > similar place you are with it. Things are 'stable' mostly through a bunch
> > of extra TLB flushes. The other change that resulted in a more stable
> > userland for me is what appears to be a missing memory barrier around
> > cpu_lwp_setprivate(). The one I added in lwp.c doesn't seem 100% correct
> > but this does resolve instability that's easily reproduced in unbound
> > (using multiple threads) and occasionally>
> > sshd:
> > diff --git a/sys/arch/mips/mips/cpu_subr.c
> > b/sys/arch/mips/mips/cpu_subr.c
> >
> > index a80304908774..df854cae4254 100644
> > --- a/sys/arch/mips/mips/cpu_subr.c
> > +++ b/sys/arch/mips/mips/cpu_subr.c
> > @@ -1051,11 +1051,11 @@ cpu_vmspace_exec(lwp_t *l, vaddr_t start, vaddr_t
> > end)>
> > int
> > cpu_lwp_setprivate(lwp_t *l, void *v)
> > {
> >
> > -
> >
> > #if (MIPS32R2 + MIPS64R2) > 0
> >
> > if (l == curlwp && MIPS_HAS_USERLOCAL) {
> >
> > mipsNN_cp0_userlocal_write(v);
> >
> > }
> >
> > + membar_sync();
> >
> > #endif
> >
> > return 0;
> >
> > }
> >
> > diff --git a/sys/kern/sys_lwp.c b/sys/kern/sys_lwp.c
> > index 7c4e4f27ad23..24cc3315f3e4 100644
> > --- a/sys/kern/sys_lwp.c
> > +++ b/sys/kern/sys_lwp.c
> > @@ -187,7 +187,7 @@ sys__lwp_self(struct lwp *l, const void *v, register_t
> > *retval)
> >
> > int
> > sys__lwp_getprivate(struct lwp *l, const void *v, register_t *retval)
> > {
> >
> > -
> > + membar_sync();
> >
> > *retval = (uintptr_t)l->l_private;
> > return 0;
> >
> > }
>
> Thanks for testing.
>
> I can’t say I understand why these memory are needed. Can you explain,
> please?
I can try =) I was working on getting unbound stable using multiple threads
to run DNS over TLS. With higher thread counts I'd get a segfault just after
starting the daemon.
Ktruss pointed to multiple lwp_getprivate and lwp_setprivate calls just before
the segfault. My assumption was that unbound is using l_private for
synchronization and perhaps that value was 'stale'.
I agree the location of these may not be correct (especially the MI one).
Maybe these calls are happening frequently to mask an issue elsewhere but I'm
unable to reproduce the segfault with them in place. Perhaps it will give us
a clue on a better place to investigate?
Home |
Main Index |
Thread Index |
Old Index