Re: some mips changes for loongson2

To: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Subject: Re: some mips changes for loongson2
From: Simon Burge <simonb%NetBSD.org@localhost>
Date: Wed, 10 Aug 2011 22:28:11 +1000
Hi Manuel,

Manuel Bouyer wrote:

> Hello,
> I've been working on getting NetBSD running on a loongson2f-based
> mini-PC (lemote fuulong). There's already ls2f support for gdium but it's
> not complete (for example, it doesn't support 64bits, nor the ISA devices
> found on lemote hardware).
> Here's a first round of patches, which do 2 things:
> - LS2f doens't have separate vector address for tlb_miss and xtlb_miss
>   exeptions, the tlb_miss handler address is used for both (and the
>   vector can be twice as large as the xtlb_miss space is free for use).
>   mips.diff contains a patch for this, where the ls2f tlb_miss
>   checks if the fault address is in useg or xuseg and takes appropriate
>   action.
> - ls2f CPUs have a hardware bug which can make them hang on a jump register
>   in kernel mode. A workaround can be implemented in binutils which adds
>   2 instructions using the at register before each j(r) ra.
>   (right now I'm using a OpenBSD patch which does:
>   li at,3
>   dmtc0 at, $22
>   to clear the branch prediction logic before any jump register. Our binutils
>   have something sighly different, I've yet to look at it to see if it's
>   useable for us - in any case that's a different topic).
>   This workaround requires the use of at register before jump register,
>   so I patched the assembly in arch/mips and common/ to allow at use before
>   jump so that the assembler can install the workaroud.
> 
> Any comment about these 2 patches ?

Good to see another machine on the way to being supported!

Comments below.

> Index: mips/lock_stubs_ras.S
> ===================================================================

> + * to work around the branch prediction engine misbehavior of
> + * Loongson 2F processors we need to clear the BTB before a j ra.
> + * This requires extra instructions which don't fit in the RAS blocks,
> + * so do a PC-relative just to a block of code (this is the same size as
> + * a j ra) where we can let the assembler install the workaround.

Perhaps spell out BTB as "branch target buffer"?  I needed google to
expand that.

> @@ -89,7 +103,7 @@
>  
>  EXPORT(_lock_ras_start)
>  STATIC_LEAF(ras_atomic_cas_noupdate)
> -     j       ra
> +     RETURN
>        move   v0, t0
>  END(ras_atomic_cas_noupdate)
>  

I _think_ I'd like to see a "j ra" comment on this, but not entirely
sure, as we don't use the "RETURN" convention anywhere else.  That said,
this hardly looks great either:

        STATIC_LEAF(ras_atomic_cas_noupdate)
                RETURN                  /* j ra   on most CPUs */
                 move   v0, t0
        END(ras_atomic_cas_noupdate)

> @@ -223,6 +237,13 @@ END(ras_mutex_vector_exit)
>  
>       .p2align LOG2_MIPS_LOCK_RAS_SIZE        /* Get out of the RAS block */
>  
> +     .set at
> +#ifdef MIPS3_LOONGSON2F
> +loongson_return:
> +     j       ra
> +      nop
> +#endif
> +

Is this chunk missing the corresponding ".set noat"?

> Index: mips/mipsX_subr.S
> ===================================================================
>  [ ... ]
> @@ -339,6 +385,7 @@ VECTOR(MIPSX(tlb_miss), unknown)
>       bltz    k0, MIPSX(kernelfault)          #02: k0<0 -> 4f (kernel fault)
>        PTR_SRL k0, 1*(PGSHIFT-PTR_SCALESHIFT)+(PGSHIFT-2)#03: k0=seg offset 
> (almost)
>       PTR_L   k1, %lo(CPUVAR(PMAP_SEG0TAB))(k1)#04: k1=seg0tab
> +#endif /* MIPS3_LOONGSON2 */

Since that's the end of an else, I believe the convention for the endif
comment would be /* !MIPS3_LOONGSON2 */.

> @@ -374,8 +421,13 @@ MIPSX(tlb_miss_common):
>  #endif
>       eret                                    #1f: return from exception
>       .set    at
> +#ifdef MIPS3_LOONGSON2
> +_VECTOR_END(MIPSX(xtlb_miss))
> +#else
>  _VECTOR_END(MIPSX(tlb_miss))
> +#endif
>  
> +#ifndef MIPS3_LOONGSON2
>  #if defined(USE_64BIT_CP0_FUNCTIONS)
>  /*
>   * mipsN_xtlb_miss routine
> @@ -391,9 +443,18 @@ _VECTOR_END(MIPSX(tlb_miss))
>   *
>   * Don't check for invalid pte's here. We load them as well and
>   * let the processor trap to load the correct value after service.
> + *
> + * Loongson2 CPUs don't have separate tlbmiss and xtlbmiss, so we have
> + * to check the address size here and branch to tlb_miss if needed.
>   */
>  VECTOR(MIPSX(xtlb_miss), unknown)
>       .set    noat
> +     lui     k1, %hi(_C_LABEL(xtlb_miss))
> +     addiu   k1, %lo(_C_LABEL(xtlb_miss))
> +     INT_L   k0, 0(k1)
> +     INT_ADDU k0, 1
> +     INT_S   k0, 0(k1)
> +
>       dmfc0   k0, MIPS_COP_0_BAD_VADDR        #00: k0=bad address
>  #ifdef _LP64
>       nop                                     #01: nop
> @@ -423,6 +484,7 @@ _VECTOR_END(MIPSX(xtlb_miss))
>  #else
>       .space  128
>  #endif /* USE_64BIT_CP0_FUNCTIONS */
> +#endif /* !MIPS3_LOONGSON2 */

If I'm reading that diff correctly, are you inside a !MIPS3_LOONGSON2
block so any Loongson2 specific code won't be used?  Specifically, I
don't want to see the non-Loongson2 codepath longer in xtlb_miss if it
doesn't have to be.


> @@ -1421,6 +1483,7 @@ NESTED_NOPROFILE(MIPSX(systemcall), CALL
>  /*
>   * Call the system call handler.
>   */
> +     .set    at
>       jalr    t9
>        move   a0, MIPS_CURLWP                 # 1st arg is curlwp
>  
> @@ -1436,7 +1499,6 @@ NESTED_NOPROFILE(MIPSX(systemcall), CALL
>       lui     ra, %hi(MIPSX(user_return))     # return directly to user return
>       j       _C_LABEL(ast)
>        PTR_ADDIU ra, %lo(MIPSX(user_return))  # return directly to user return
> -     .set    at
>  END(MIPSX(systemcall))

My MIPS is too rusty - are there any code implications for the
non-Loongson2 case by moving the ".set at" earlier?

> @@ -1706,6 +1768,8 @@ END(MIPSX(tlb_invalid_exception))
>       .globl  _C_LABEL(MIPSX(exceptionentry_end))
>  _C_LABEL(MIPSX(exceptionentry_end)):
>  
> +     .set    at
> +

Similarly, does this introduce any code implications?

> @@ -1774,8 +1838,8 @@ LEAF(MIPSX(tlb_update))
>       tlbwi                                   # update slot found
>       COP0_SYNC
>  #ifdef MIPS3_LOONGSON2
> -     li      k0, MIPS_DIAG_ITLB_CLEAR
> -     mtc0    v0, MIPS_COP_0_DIAG             # invalidate ITLB
> +     li      t1, MIPS_DIAG_ITLB_CLEAR
> +     mtc0    t1, MIPS_COP_0_DIAG             # invalidate ITLB

Without understanding the Loongson2 code, this appears to be changing
the behaviour of this code (and similar code chunks below this).  In the
existing code k0 is loaded but v0 is moved to the MIPS_COP_0_DIAG reg.
In your patch t1 is used for both.  I'm guessing that the previous code
was buggy and using the same register is now correct?

> Index: lib/libc/arch/mips/atomic/atomic_add.S
> ===================================================================
> RCS file: /cvsroot/src/common/lib/libc/arch/mips/atomic/atomic_add.S,v
> retrieving revision 1.2
> diff -u -p -u -r1.2 atomic_add.S
> --- lib/libc/arch/mips/atomic/atomic_add.S    14 Dec 2009 00:38:59 -0000      
> 1.2
> +++ lib/libc/arch/mips/atomic/atomic_add.S    10 Aug 2011 10:17:52 -0000
> @@ -30,12 +30,22 @@
>  #include <machine/asm.h>
>  #include "atomic_op_asm.h"
>  
> +
>  RCSID("$NetBSD: atomic_add.S,v 1.2 2009/12/14 00:38:59 matt Exp $")

Extra blank line?

>  
>       .text
> -     .set    noat
>       .set    noreorder
> +#ifdef _KERNEL_OPT
> +#include "opt_cputype.h"
> +#ifndef MIPS3_LOONGSON2F
> +     .set    noat
>       .set    nomacro
> +#endif
> +#else /* _KERNEL_OPT */
> +     .set    noat
> +     .set    nomacro
> +#endif /* _KERNEL_OPT */
> +

This fragment is repeated in multiple files and would be good to only do
it once.  But I can't think of a way of doing that offhand...

> Index: lib/libc/arch/mips/string/bcopy.S
> ===================================================================
> RCS file: /cvsroot/src/common/lib/libc/arch/mips/string/bcopy.S,v
> retrieving revision 1.3
> diff -u -p -u -r1.3 bcopy.S
> --- lib/libc/arch/mips/string/bcopy.S 14 Dec 2009 00:39:00 -0000      1.3
> +++ lib/libc/arch/mips/string/bcopy.S 10 Aug 2011 10:17:53 -0000
> @@ -170,8 +170,10 @@ LEAF(FUNCTION)
>       PTR_ADDU        DSTREG,1
>  
>  4:   # copydone
> +     .set at         #-mfix-loongson2f-btb
>       j       ra
>       nop
> +     .set noat
>  
>       /*
>        *      Copy from unaligned source to aligned dest.
> @@ -264,8 +266,10 @@ LEAF(FUNCTION)
>       PTR_SUBU        DSTREG,1
>  
>  4:   # copydone
> +     .set at         #-mfix-loongson2f-btb
>       j       ra
>       nop
> +     .set noat
>  
>       /*
>        *      Copy from unaligned source to aligned dest.

See my earlier comment about code generation on non-Loongson2 CPUs.


Other than getting your Loongson2 machine working, my main concern is
changing code for the non-Loongson2 where it doesn't need to be.

Cheers,
Simon.
Follow-Ups:
- Re: some mips changes for loongson2
  - From: Manuel Bouyer
- Re: some mips changes for loongson2
  - From: David Holland
References:
- some mips changes for loongson2
  - From: Manuel Bouyer
Prev by Date: some mips changes for loongson2
Next by Date: Re: some mips changes for loongson2
Previous by Thread: some mips changes for loongson2
Next by Thread: Re: some mips changes for loongson2
Indexes:
Home | Main Index | Thread Index | Old Index