Subject: Re: new sysctl: hw.cpu_isa
To: Simon Burge <simonb@wasabisystems.com>
From: Chris G. Demetriou <cgd@sibyte.com>
List: tech-kern
Date: 11/14/2000 09:25:40
simonb@wasabisystems.com (Simon Burge) writes:
> I'd like to add a new sysctl the returns the CPU ISA.  I would suspect
> that when we have SMP that the lowest ISA of any of the CPUs would be
> returned by this sysctl.  My primary inspiration for this is to allow
> selection of optimised libraries on ELF systems using /etc/ld.so.conf.
> For example, on my test alpha I have in /etc/ld.so.conf
> 
> 	libm.so.0	hw.cpu_arch	ev56:libm-ev56.so.0
> 	libc.so.12	hw.cpu_arch	ev56:libc-ev56.so.12
> 
> and the lib*-ev56 libraries have been compiled with -mcpu=ev56.  To
> quote a meaningless benchmark, this got me about a 10% speed improvement
> for the dhrystone benchmark on a 500MHz 21164a.

This isn't the right thing, at least for the purpose you're
describing.  You don't want to optimize libraries, and select them,
based on CPU, at least for the Alpha architecture.

You want to pick which CPU features are meaningful to performance of
various libraries, and then compile and select based on them.  you
don't want to have to specify different lines, or even different
entries on a line, for CPUs which support a common set of features.
e.g. ev6 and ev67 would both use the bwx,precise_fp_traps libraries,
without need to specify ev6, ev67, etc.

Perhaps what you're suggesting makes sense for CPUs or ISAs other than
Alpha.  For instance, on MIPS, you really do have the notion of
multiple ISA levels which work in a way you suggest (but even there
the notion of application specific extensions to the ISAs are coming
in).  But it's not clear that this is a worthwhile MI feature intended
to be used for anything other than 'user information'.


> 	alpha	ev{4,5,56,6,67}

to be complete, you have forgotten:

	ev45, lca45, pca56

I'm not sure what combination of extensions is in each... but
certainly caches and insn scheduling change, and iirc pca56 didn't
include the same extensions as ev56 (but it's been a while since i
cared to recall 8-).


> So, anyone disagree with this whole idea or have any suggestions of
> improvements?

See above.  I think you've got a problem to solve here, but this is
the wrong tack to take to solve it.

To the extent that it's desirable to provide a generic indicator of
"CPU type" within an architecture, what you're suggesting may be fine.
And on some architectures, it may even be appropriate for
optimization.  I'm not convinced that it's a useful
machine-independent feature.

But for Alpha it's definitely not the right thing, and Bill
S. indicated the same for x86.


In a later message, you said:
> For my original intended use (shared library selection), a feature
> set doesn't scale.  If you've got 5 features, you don't want to have
> 32 difference libraries to choose from.  My initial thinking was that
> feature set info should be a MD thing and in a machdep.<something>
> sysctl.

I'd say you're looking at this wrong.

Don't optimize (or at least, put optimized libraries into the default
build 8-) until you know what it'll get you, and in particular what
optimizations are worthwhile.


For alpha, in particular, there are several things worth considering:

(1) differences caused by CPU implementations.  I.e. caches,
pipelines, etc.

(2) instruction set extensions worth optimizing for.

In the latter category, on the alpha, there are indeed 5
extensions defined last i checked:

	BWX (byte/word accesses)
	FIX (square root + FP conversion)
	CIX (count extension)
	MVI (multimedia)
	precise FP traps.

It's definitely not worth N libraries for each of those.  what you
want to do is come up with:

(a) a mapping from CPUs -> extensions that matter,

(b) build binaries that optimize for specific extension combinations
(e.g. none, BWX, BWX+CIX+FIX+precise, etc.)

And then, for those extension combinations, come up with a reasonable
insn scheduling policy for that library.  e.g. none -> schedule for
ev4, bwx -> schedule for ev56, etc.

If you find that there are combinations that aren't worth serving
with a complete custom library (e.g. if you find that some processors
would be BWX, and others BWX+CIX, without the latter ones falling into
a separate category), then what you want is to overlay a library which
contains minimal stuff in 'front' of the normal library.

e.g. load like:

	BWX:	a libc compiled w/ BWX support.
	BWX + CIX:
		a libc stub containing functions meaningfully modified
		    by CIX (maybe just ffs() 8-), followed by
		a libc compiled w/ BWX support.



My point is, it's not worth having 2^N libraries, but neither is it
typically worth having M, where M is the number of CPUs in a given
architecture.  Indeed, the need to list M, where M can potentially
grow and the "not-present" case doesn't give you best effort,
indicates a bad architectural decision.

I.e. say I've got an ev99, that i just made my kernel run on.  It
supports BWX, CIX, FIX, precise FP traps, plus the new NIX extension
that lets is solve NP-complete problems in polynomial time.  The
default configuration, supporting a pre-existing optimized library for
BWX+CIX+FIX+precise, should get that library, without need to add
additional entries into ld.so.conf or suffer other similar tweaking.



cgd