Subject: RE: Instruction question: bbXX .vs. insv
To: 'Matt Thomas' <matt@3am-software.com>
From: Antonio Carlini <arcarlini@iee.org>
List: port-vax
Date: 07/18/2003 17:58:53
> In a number of places, NetBSD/vax uses the construct of
>=20
>     bb{sc,cc,cs,cc} <BITNO>,<DST>,label
> label:

For CVAX, the timings are like this:

Work out the timings for each operand (bearing in mind that the
last oeprand is different) and add on the timing for the instruction
itself.

<BITNO> and <DST> will take the same number of "operand cycles" for
either form; best case is 1 microcycle each (for short literal or
register
access).

BBxx has <label> as the final operand and that costs 1 microcycle plus
1 read cycle, which is listed as 1+1r in the table. If the opcode and
the first specifier of the instruction at the branch target overlap,
throw in one extra microcycle. So here you are likely to have a
cost so far of 2+1r.

For a register DST the following additional costs apply:
(no branch): 6   (branch): 8 + 1r

For a memory DST it becomes

(no branch): 7 + 1r + 1w
(branch):    7 + 2r + 1w

So the total cost for BBxx with register DST and no overlap label=20
and no branch: 9 + 1r=20

For a BBxx with memory DST and an overlapping label and a taken
branch: 11 + 3r + 1w.

(Plus the cost of the two operands that were omitted above as common).

>  to set or clear individual bits.  Now that got me to=20
> wondering whether doing the following is an improvement:
>=20
>     insv {$1,$0},<BITNO>,$1,<DST>

INSV has two extra arguments: assuming a short literal for the
first, that's 1 microcycle. Same goes for the second. But DST
might actually count as 0 here as last operand if it is a
register. So, for simplicity, count the overall cost of
the two args as 1+1-1 =3D 1.

INSV is listed as 10-12 with 10 typical for register operand
INSV is listed as 13 + 1r + 1w =3D> 15 + 2r + 3w for memory
(with the shorter time being typical).

So for CVAX, assuming I've counted right (and understood the
timings in the first place :-)) then BBxx looks faster for
registers/branch-not-taken. Beyond that, it's the savings=20
seem to drop away.

All of this is moot anyway, since the cache will make a huge difference
in all of this (except the cycle times I guess).

I have a 78032 (uVAX II) manual somewhere that probably lists timings
and they're probably simpler than the CVAX ones. I also have an NVAX
(or NVAX+) spec somewhere that *may* list such things but the timings
are probably even less useful for that case. Assuming a find these
manuals, I'll see what I can cook up.

I assume this is some frequently hit code that you wnt to change?

Antonio
=20
--=20

---------------
Antonio Carlini             arcarlini@iee.org