Subject: Re: porting to a idt rc32332
To: Toru Nishimura <nisimura@itc.aist-nara.ac.jp>
From: Chris G. Demetriou <cgd@netbsd.org>
List: tech-embed
Date: 02/28/2001 15:55:18
nisimura@itc.aist-nara.ac.jp (Toru Nishimura) writes:
> Whenever I look at mips/ directory, my lung starts choking for more
> fresh oxygen.

heh.


> The IDT chip is one implemetation of recently defined
> MIPS32 specification.

Close, but not quite.  I've not looked closely at the manuals, but it
differs from MIPS32 in at least one way: it does not identify itself
as a MIPS32 processor.  (In particular, at minimum PRid and config are
non-MIPS32.)


> For insn set wise it's a MIPS-II,

plus a few instructions.


> but has R4000
> style doubled TLB entry MMU,

yup, but all the CP0 regs are 32-bits.


> yet 32bit long for everything,

more properly stated, has 32-bit GPRs.


> has 2way
> set associative primary cache.

That's mostly orthogonal to its MIPS32-ness.


> What neccesary to have is closures
> to encapulate parameters and "ops" to build sane foundations like this;
> 
> struct cpuops mips1_cpuops = {
>         mips1_inv_icache,
>         mips1_sync_inv_dcache,
>         mips1_sync_dcache,
>         mips1_inv_dcache,
>         mips1_flush_cache,
>         mips1_SETASID,
>         mips1_TBIAP,
>         mips1_TBIS,
>         mips1_TLBWR,
>         mips1_wbflush,          /* depends on hardware implementation */
> };

part of the problem with this, by the way, is that the arguments to
the cache functions seem to differ in meanings between MIPS1 and MIPS3
variants.


> [ ... ]

This may be a good start, but is not sufficient for sane support of
modern CPUs in several ways.

(1) you need to abstract l1, l2, possibly l3 caches, via separate
    functions.  Some may be provided as part of the CPU code, some may
    be provided by board or system level code.

(2) you need to make some accomodation for different CCA values used
    by different CPUs.

(3) you need to make some accomodation for CPUs with coherent memory
    systems.  A lot of D-cache flushes are typically unnecessary for
    them.  Also, depending on the exact type of (coherent) system,
    mixing uncached and cacheable accesses to a location may be
    dangerous w.r.t. coherency.

(4) you need to make better accomodation for CPUs with MIPS3-style
    cache ops but physically indexed (at least D) caches.

(5) If possible, the system needs to rely solely on 'hit' cache ops,
    rather than index cache ops, for CPUs with MIPS3-style cache ops
    (except in the case of flushing the whole cache).  hit op -> whole
    cache flush conversion should be done based on some type of
    heuristic.  The issue here is, with direct-mapped caches, who
    cares.  but with 2 and 4 or even 8-way set-associative caches,
    using index ops blows are large hole in your cache.

(6) you really need to provide cpu-specific functions for TLB miss
    handling, etc.  If you look at the linux mips sources, you'll see
    that there's much variation between CPUs that's addressed there
    that we just punt on.

And, of course, the cache and TLB refill (especially the TLB refill)
are fairly performance sensitive.


My personal opinion on how this stuff should be handled:

(0) if MP, assume homogenous CPUs.  Otherwise, you'll soon go mad.

(1) for performance-criticial functions, provide generic
    implementations, followed by padding, for all the functions that
    indirect through a function switch (but with finer granularity of
    contents, e.g. l1 cache flush fns, l2 cache flush fns, etc. plus
    some additional information about the size of those fns.)

(2) very early on in kernel start, look at CPU and system board type,
    to fill in such a function table, with such things as the above.

(3) if !DEBUG, for performance critical functions, overwrite the
    generic 'jump through pointers' functions with the specific
    functions (concatenated together) for the CPU & system.  Use the
    function switch to flush the Icache.  8-)


I'd do the cache functions in the function switch as:

	typedef void (*cache_fn)(void)
	for each basic cache op {
		int n_cache_fns;
		cache_fn cache_fns[MAX_CACHE_FNS];
		size_t cache_fn_nonend_copy_size[MAX_CACHE_FNS];
		size_t cache_fn_end_copy_size[MAX_CACHE_FNS];
	}

where copy size is amount to copy to the fixed address depending on
whether or not it's the last function.  (The last would include the
return.  8-)  The functions would have to be coded to preserve their
arguments, so you could "fall through" from one to the next, etc.

I think it'd probably be correct to fill in the CPU functions first,
then the board-level functions.  (Only case where i think that might
be wrong is for icache-sync... i'd think about that for a while before
implementing.)


anyway, just some thoughts to chew on.  8-)



chris
-- 
Chris Demetriou - cgd@netbsd.org - http://www.netbsd.org/People/Pages/cgd.html
Disclaimer: Not speaking for NetBSD, just expressing my own opinion.