Subject: Re: Several questions regarding the source
To: Michael Adda <michael_adda@hotmail.com>
From: Eduardo Horvath <eeh@turbolinux.com>
List: port-sparc64
Date: 07/05/2000 07:55:38
On Wed, 5 Jul 2000, Michael Adda wrote:

> On Tue, 4 Jul 2000, Eduardo Horvath wrote:
> >
> >The file sys/arch/sparc64/sparc64/asm.h is deprecated and to the best of
> >my knowledge unused.  The contents have all been moved to
> >sys/arch/sparc64/include/ctlreg.h.  I believe those issues have been fixed
> >in ctlreg.h.  If you do find any problems I would appreceate the
> >information.
> >
> I am currently examining ctlreg.h but already I've seem to find an
> anomaly. I am not sure that it have any implication on the kernel, though.
> I've taken the 32 bit variant of the macro ldda and altered it a bit.
> I did it in order to examine register usage in the produced assembler code.
> So the revised macro looks like :
> #define ldd(loc) ({                                                  \
>         register long long _lda_v, _loc_hi, _pstate;                 \
>         _loc_hi = (((u_int64_t)loc)>>32)+1;                          \
>         __asm __volatile("sllx %2,32,%0; or %0,%1,%0; ldd [%0],%0" : \
>                 "=&r" (_lda_v) : "r" ((unsigned long)(loc)) ,        \
>                  "r" (_loc_hi) );                                    \
>         _lda_v;                                                      \
> })
> As you can see, I've removed the privileged instructions, used ldd instead
> of ldda, and I added 1 to _loc_hi. That way I can check it easily.
> I then compiled the line "ldd ( 17);". The output I got was :
>         mov 17,%o2
>         mov 0,%o0
>         mov 1,%o1
>         sllx %o0,32,%g2; or %g2,%o2,%g2; ldd [%g2],%g2
> It seem that the compiler assigns %o2 and %o0 to loc itself, and %o1
> to _loc_hi. Despite the fact that %o1 is _loc_hi in the asm directive
> it uses %o0 for _loc_hi.
> The compilation was done with "gcc foo.c -S -O2". I used both
> egcs-2.90.29 980515 (egcs-1.0.3 release), on a ultra linux box 
> (ultrapenguin),
> and gcc-2.95.2, cross compiler for sparc running on i386 OpenBSD 2.6 .
> This behavior increased my suspicious, so I compiled the original macro
> invoked as "ldda(0x100000011,13)" with the line "gcc foo.c -S -O2".
> The assembly code was:
>        mov 17,%o4
>        mov 0,%o2
>        mov 1,%o3
>        mov 13,%o5
>        wr %o5,%g0,%asi; sllx %o2,32,%o0; rdpr %pstate,%g2;
> 
>        or %o0,%o4,%o0; wrpr %g2,8,%pstate; ldda [%o0]%asi,%o0;
>        wrpr %g2,0,%pstate
> Again, _loc_hi is silently ignored. This happens when compiling without
> optimization as well.
> I've tested this behavior with a variable instead of an immediate, but
> same problem occurred. _loc_hi is being set as zero always. Compiling
> "ldda (x,19)" produced:
>         x is in %o0 and %o1, as a function returned it.
>         mov 19,%l0
>         mov %o0,%o3
>         mov 0,%o2
>         wr %l0,%g0,%asi; sllx %o2,32,%g2; rdpr %pstate,%o4;
>         or %g2,%o1,%g2; wrpr %o4,8,%pstate; ldda [%g2]%asi,%g2;
>         wrpr %o4,0,%pstate
> This too, was checked on both compilers mentioned above.
> Can you validate the phenomenon described above ?

Hmm.  Interesting.  I'll need to look into this.

ldd and ldda are deprecated on sparcv9.  Apparently, in addition to
possible performance issues, ldda does not properly load data using
little-endian accesses on UltraSPARC I and II processors; the endiannes in 
each register of the pair is reversed, but the data between the two
registers is not swapped.  And AFAIK ldda is never used in sparc64
kernels.  Therefore I think it makes sense to remove ldda completely.

> One more question. Is there any documentation about the reasons led
> to using DCACHE_BUG in ctlreg.h (force flushing of the D$ line) ?
> Is it a processor bug or something bogus is the netbsd kernel ?

This is an interesting issue which finally seems to have been documented
in the errata section of the UltraSPARC IIi manual.  The data cache
uses virtual addresses but does not have any address space information in
the tags, and apparently it will operate on MMU bypass accesses and use
the physical address of the accesss, so if you use one of the bypass ASIs
and there is a data cache hit you will get data from the data cache
instead of the physical cache.  Conversly, the data cache may also cache
writes using bypass ASIs.  Hence it is necessary to flush the cache line
before and after all MMU bypass accesses.

Eduardo Horvath