Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: xrender in cg14 & SBusFPGA (was:Re: CG14 with xrender)



Hello,

On Wed, 24 Aug 2022 19:17:59 +0200
Romain Dolbeau <romain%dolbeau.org@localhost> wrote:

> The original 8/24 bit FB had been prototyped in SBusFPGA, but a lot of
> work went into making the NuBusFPGA "reasonably fast", so I backported
> that variant to the SBusFPGA to get an accelerated 8/24 (8 bits in
> console, 24 bits in X11) framebuffer.

Nice!

> And I've started adding 'render' support - or rather, EXA Composite.
> Which is not the best documented piece of software :-(

Indeed.

> Having used the cg14 code as 'inspiration', some thoughts and questions.

That code / hardware is probably not a good example for anything
straight and simple - if you want your brain to melt look at the
various CG14Copy8*() variants.
The reason there are so many of them is that we try to use 32bit
accesses and as many registers as possible for every operation, omit
memory reads if possible and deal with misaligned source and
destination pixmaps.
The result is somewhere between a plain CG6 and a TGX in terms of speed.
CG14Copy32() is far simpler...

> Le mer. 11 mai 2022 à 23:09, Michael <macallan%netbsd.org@localhost> a écrit :
> > (...) I'll have to read up on what exactly things
> > like PictOpSrc with no source operand but a mask are supposed to do.  
> 
> From my reading of render, the ((src IN msk) OP dst) uses Alpha of 1.0
> and colors of 0.0 by default, so I read this specific combination as
> setting the dst to the msk alpha and 0 color.
> (src OpIn msk) will apply the msk's alpha to src, so that produces (mskA,0,0,0)
> Then the OpSrc operator will just copy that to dst.
> I can't try that on my system as I don"t see that operator applied
> after adding OpOver and OpAdd.

I wrote the xrender code in cg14 mostly by watching which operations
things like Windowmaker, gtk2 and kde3 trigger, then fill them in as
much as feasible, then mess with it until the result looks right.
All while learning SX asm and adding new instructions as needed.
This is very much a work in (slow) progress.

> Also, 'rendercheck' fails a lot ... even without any acceleration :-(
> Endianess issue ?

Possible, I never really used it. 

> Also a question on the SX instructions; the name of the multiplication
> instruction sort of implies it does "(a*b)>>8" (a*b)/256), but render
> says [0,255] is [0,1], so the computation needed is (a*b)/255 if I
> read the spec correctly ?

Yeah, I'm aware. SX isn't exactly fast by today's standards, the
multiplication instruction gives us the >>8 bit for free and the
difference is probably not really visible...

> (I've added a custom 4x8 SIMD FMA to my RISC-V core to do ((a*b)/255
> +| c), where +| is the saturated addition ; as I understand it the SW
> does saturation by clamping during the 'store' ?). Without the
> 'appropriate' division, 'rendercheck -t gtk_argb_xbgr' complains that
> the result is 1 too low in every channel; e.g. (0x44*0xff)>>8 is
> 0x43, but should be pass-through so 0x44.

Yeah, I didn't think spending the extra SX cycles to get that right was
worth it, there are bigger problems like misrendered gtk3 buttons.

> And I don't see the cg14 code checking for 'filter' or 'transform',
> are they never set ? I think if they are non-NULL the code should
> currently decline to accelerate. But I've yet to see either non-NULL
> on my system, I'm not sure if that's actually used...

I haven't seen them filled in either, but I do try to weed out
operations we don't accelerate ( and probably never will ) early on.

have fun
Michael


Home | Main Index | Thread Index | Old Index