Re: some mips changes for loongson2

To: Simon Burge <simonb%NetBSD.org@localhost>
Subject: Re: some mips changes for loongson2
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Wed, 10 Aug 2011 20:56:35 +0200
On Wed, Aug 10, 2011 at 10:28:11PM +1000, Simon Burge wrote:
> [...]
> > Index: mips/lock_stubs_ras.S
> > ===================================================================
> 
> > + * to work around the branch prediction engine misbehavior of
> > + * Loongson 2F processors we need to clear the BTB before a j ra.
> > + * This requires extra instructions which don't fit in the RAS blocks,
> > + * so do a PC-relative just to a block of code (this is the same size as
> > + * a j ra) where we can let the assembler install the workaround.
> 
> Perhaps spell out BTB as "branch target buffer"?  I needed google to
> expand that.

Sure, I updated my source tree.

> 
> > @@ -89,7 +103,7 @@
> >  
> >  EXPORT(_lock_ras_start)
> >  STATIC_LEAF(ras_atomic_cas_noupdate)
> > -   j       ra
> > +   RETURN
> >      move   v0, t0
> >  END(ras_atomic_cas_noupdate)
> >  
> 
> I _think_ I'd like to see a "j ra" comment on this, but not entirely
> sure, as we don't use the "RETURN" convention anywhere else.  That said,
> this hardly looks great either:
> 
>       STATIC_LEAF(ras_atomic_cas_noupdate)
>               RETURN                  /* j ra   on most CPUs */
>                move   v0, t0
>       END(ras_atomic_cas_noupdate)

I renamed it to J_RA as suggested by David.

> 
> > @@ -223,6 +237,13 @@ END(ras_mutex_vector_exit)
> >  
> >     .p2align LOG2_MIPS_LOCK_RAS_SIZE        /* Get out of the RAS block */
> >  
> > +   .set at
> > +#ifdef MIPS3_LOONGSON2F
> > +loongson_return:
> > +   j       ra
> > +    nop
> > +#endif
> > +
> 
> Is this chunk missing the corresponding ".set noat"?

No, it's on purpose. AFAIK there's no issue using the at register from
here till the end of the file.

> 
> > Index: mips/mipsX_subr.S
> > ===================================================================
> >  [ ... ]
> > @@ -339,6 +385,7 @@ VECTOR(MIPSX(tlb_miss), unknown)
> >     bltz    k0, MIPSX(kernelfault)          #02: k0<0 -> 4f (kernel fault)
> >      PTR_SRL k0, 1*(PGSHIFT-PTR_SCALESHIFT)+(PGSHIFT-2)#03: k0=seg offset 
> > (almost)
> >     PTR_L   k1, %lo(CPUVAR(PMAP_SEG0TAB))(k1)#04: k1=seg0tab
> > +#endif /* MIPS3_LOONGSON2 */
> 
> Since that's the end of an else, I believe the convention for the endif
> comment would be /* !MIPS3_LOONGSON2 */.

You're probably right (I never clearly seen what the convention is :),
I've changed my sources.

> 
> > @@ -374,8 +421,13 @@ MIPSX(tlb_miss_common):
> >  #endif
> >     eret                                    #1f: return from exception
> >     .set    at
> > +#ifdef MIPS3_LOONGSON2
> > +_VECTOR_END(MIPSX(xtlb_miss))
> > +#else
> >  _VECTOR_END(MIPSX(tlb_miss))
> > +#endif
> >  
> > +#ifndef MIPS3_LOONGSON2
> >  #if defined(USE_64BIT_CP0_FUNCTIONS)
> >  /*
> >   * mipsN_xtlb_miss routine
> > @@ -391,9 +443,18 @@ _VECTOR_END(MIPSX(tlb_miss))
> >   *
> >   * Don't check for invalid pte's here. We load them as well and
> >   * let the processor trap to load the correct value after service.
> > + *
> > + * Loongson2 CPUs don't have separate tlbmiss and xtlbmiss, so we have
> > + * to check the address size here and branch to tlb_miss if needed.
> >   */
> >  VECTOR(MIPSX(xtlb_miss), unknown)
> >     .set    noat
> > +   lui     k1, %hi(_C_LABEL(xtlb_miss))
> > +   addiu   k1, %lo(_C_LABEL(xtlb_miss))
> > +   INT_L   k0, 0(k1)
> > +   INT_ADDU k0, 1
> > +   INT_S   k0, 0(k1)
> > +
> >     dmfc0   k0, MIPS_COP_0_BAD_VADDR        #00: k0=bad address
> >  #ifdef _LP64
> >     nop                                     #01: nop
> > @@ -423,6 +484,7 @@ _VECTOR_END(MIPSX(xtlb_miss))
> >  #else
> >     .space  128
> >  #endif /* USE_64BIT_CP0_FUNCTIONS */
> > +#endif /* !MIPS3_LOONGSON2 */
> 
> If I'm reading that diff correctly, are you inside a !MIPS3_LOONGSON2
> block so any Loongson2 specific code won't be used?  Specifically, I
> don't want to see the non-Loongson2 codepath longer in xtlb_miss if it
> doesn't have to be.

This is leftover debug code (counters to see if tlb handler were
called at all) which I forgot to remove. It's now gone.

> 
> 
> > @@ -1421,6 +1483,7 @@ NESTED_NOPROFILE(MIPSX(systemcall), CALL
> >  /*
> >   * Call the system call handler.
> >   */
> > +   .set    at
> >     jalr    t9
> >      move   a0, MIPS_CURLWP                 # 1st arg is curlwp
> >  
> > @@ -1436,7 +1499,6 @@ NESTED_NOPROFILE(MIPSX(systemcall), CALL
> >     lui     ra, %hi(MIPSX(user_return))     # return directly to user return
> >     j       _C_LABEL(ast)
> >      PTR_ADDIU ra, %lo(MIPSX(user_return))  # return directly to user return
> > -   .set    at
> >  END(MIPSX(systemcall))
> 
> My MIPS is too rusty - are there any code implications for the
> non-Loongson2 case by moving the ".set at" earlier?

AFAIK, no. I don't know in which circunstances the assembler would use the
at register, but from what I've seen, if the assembler can deal with
code in a noat section, the same code in a at section won't use the at
register either.

> 
> > @@ -1706,6 +1768,8 @@ END(MIPSX(tlb_invalid_exception))
> >     .globl  _C_LABEL(MIPSX(exceptionentry_end))
> >  _C_LABEL(MIPSX(exceptionentry_end)):
> >  
> > +   .set    at
> > +
> 
> Similarly, does this introduce any code implications?
> 
> > @@ -1774,8 +1838,8 @@ LEAF(MIPSX(tlb_update))
> >     tlbwi                                   # update slot found
> >     COP0_SYNC
> >  #ifdef MIPS3_LOONGSON2
> > -   li      k0, MIPS_DIAG_ITLB_CLEAR
> > -   mtc0    v0, MIPS_COP_0_DIAG             # invalidate ITLB
> > +   li      t1, MIPS_DIAG_ITLB_CLEAR
> > +   mtc0    t1, MIPS_COP_0_DIAG             # invalidate ITLB
> 
> Without understanding the Loongson2 code, this appears to be changing
> the behaviour of this code (and similar code chunks below this).  In the
> existing code k0 is loaded but v0 is moved to the MIPS_COP_0_DIAG reg.
> In your patch t1 is used for both.  I'm guessing that the previous code
> was buggy and using the same register is now correct?

I forgot to comment on this, sorry. Yes, the previous code is buggy:
in several case we load the value to write to cop0 in k0 and transfer
v0 to cop0 instead. In addition, this specific case is also clobbering
the return value of tlb_update(). using t1 here is safe.

> 
> > Index: lib/libc/arch/mips/atomic/atomic_add.S
> > ===================================================================
> > RCS file: /cvsroot/src/common/lib/libc/arch/mips/atomic/atomic_add.S,v
> > retrieving revision 1.2
> > diff -u -p -u -r1.2 atomic_add.S
> > --- lib/libc/arch/mips/atomic/atomic_add.S  14 Dec 2009 00:38:59 -0000      
> > 1.2
> > +++ lib/libc/arch/mips/atomic/atomic_add.S  10 Aug 2011 10:17:52 -0000
> > @@ -30,12 +30,22 @@
> >  #include <machine/asm.h>
> >  #include "atomic_op_asm.h"
> >  
> > +
> >  RCSID("$NetBSD: atomic_add.S,v 1.2 2009/12/14 00:38:59 matt Exp $")
> 
> Extra blank line?

Removed

> 
> >  
> >     .text
> > -   .set    noat
> >     .set    noreorder
> > +#ifdef _KERNEL_OPT
> > +#include "opt_cputype.h"
> > +#ifndef MIPS3_LOONGSON2F
> > +   .set    noat
> >     .set    nomacro
> > +#endif
> > +#else /* _KERNEL_OPT */
> > +   .set    noat
> > +   .set    nomacro
> > +#endif /* _KERNEL_OPT */
> > +
> 
> This fragment is repeated in multiple files and would be good to only do
> it once.  But I can't think of a way of doing that offhand...
> 
> > Index: lib/libc/arch/mips/string/bcopy.S
> > ===================================================================
> > RCS file: /cvsroot/src/common/lib/libc/arch/mips/string/bcopy.S,v
> > retrieving revision 1.3
> > diff -u -p -u -r1.3 bcopy.S
> > --- lib/libc/arch/mips/string/bcopy.S       14 Dec 2009 00:39:00 -0000      
> > 1.3
> > +++ lib/libc/arch/mips/string/bcopy.S       10 Aug 2011 10:17:53 -0000
> > @@ -170,8 +170,10 @@ LEAF(FUNCTION)
> >     PTR_ADDU        DSTREG,1
> >  
> >  4: # copydone
> > +   .set at         #-mfix-loongson2f-btb
> >     j       ra
> >     nop
> > +   .set noat
> >  
> >     /*
> >      *      Copy from unaligned source to aligned dest.
> > @@ -264,8 +266,10 @@ LEAF(FUNCTION)
> >     PTR_SUBU        DSTREG,1
> >  
> >  4: # copydone
> > +   .set at         #-mfix-loongson2f-btb
> >     j       ra
> >     nop
> > +   .set noat
> >  
> >     /*
> >      *      Copy from unaligned source to aligned dest.
> 
> See my earlier comment about code generation on non-Loongson2 CPUs.
> 
> 
> Other than getting your Loongson2 machine working, my main concern is
> changing code for the non-Loongson2 where it doesn't need to be.

there won't be any code change here. I'm not even sure why this is
in .noat at all, I can't see any problems in using the at register
in this functions. But I did prefer to keep the changes to generated code
as small as possible, to not add extra issues while I'm dealing with
loongson-specific ones :)

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--
Follow-Ups:
- Re: some mips changes for loongson2
  - From: Simon Burge
References:
- some mips changes for loongson2
  - From: Manuel Bouyer
- Re: some mips changes for loongson2
  - From: Simon Burge
Prev by Date: Re: some mips changes for loongson2
Next by Date: binutil hacks for loongson2
Previous by Thread: Re: some mips changes for loongson2
Next by Thread: Re: some mips changes for loongson2
Indexes:
Home | Main Index | Thread Index | Old Index