Port-m68k archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: m68k cacheops header -- some questions



> Hey folks,

(Sorry for the delayed reply; I needed some time to dig into
 a few memories in my head.)

> Some questions about the m68k cacheops header:
> 
> 1. The TBIS (TLB-invalidate-single) always operate on the
> user and supervisor side.  Is it worth having separate
> USER vs KERN versions of these operations to support systems
> that can do it (which I think includes the `851, `030, `040,
> and `060)?  Obviously the HP MMU doesn't get to have this
> optimization.  Wildcard: I'm not certain what happens with
> the D-cache on the `030 (are the cache lines tagged with the
> function code?  I need to re-read the manual I guess...)

I have few knowledge about m68k TLB/ATC, but a conclusion of
discussion with ChatGPT (heh) seems reasonable:

---

 Regarding whether it is worthwhile to split TBIS into
 USER vs KERN variants:

 Linux/m68k is actually a useful reference here, since
 it implements this distinction quite explicitly.

  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/m68k/include/asm/tlbflush.h?h=v6.17.8

 For 030/040/060 systems:

  * flush_tlb_page() invalidates user-side ATC entries only
    (using pflush #user,addr / pflush #0,#4,(addr) depending
     on the CPU).

  * flush_tlb_kernel_page() invalidates kernel-side entries only
    (pflush #supervisor,addr / pflush #4,#4,(addr)).

  * flush_tlb_all() falls back to pflusha, invalidating both sides.

 So Linux does make use of the USER/SUPERVISOR separation that
 the m68k MMU provides, and uses it to avoid unnecessary
 invalidation of kernel ATC entries during normal user-space
 page operations.

 Whether this buys us anything in NetBSD is a separate question.
 On m68k TLB/ATC sizes are small, kernel mappings are stable, and
 the frequency of TLB operations is relatively low. The simplicity
 of a unified TBIS may very well outweigh the small performance
 benefit of splitting USER vs KERN operations. But at least from
 the Linux side, there is clear precedent for using the distinction
 when the hardware supports it.

---


> 2. What's the difference between ICIA and ICPA?  As far as
> I can tell, they're equivalent within each CPU+MMU tuple.

We have to check "why ICPA was introduced in 4.4BSD".

ICPA() (and also ICPL(), ICPP(), DCPL(), DCPP(), and DCPA() etc.)
seems implemented for HP380, i.e. 68040 L2 physical cache
(maybe for DMA ops):
 https://svnweb.freebsd.org/csrg/sys/hp300/hp300/locore.s?r1=50544&r2=53933

On the other hand, 4.3BSD already had ICIA() even for 020/030
and it was used in cachectl1() in hp300/sys_machdep.c etc.

But 68040 ICIA() is equivalent of ICPA(), so the difference between
ICIA() and ICPA() is 'usage, intention, readability, or history' etc.
I guess.


> 3. What's with PCIA_20?  It uses the CACR, but the 68020 doesn't
> have an on-chip D-cache, right?

It looks PCIA() was also prepared for the above cachectl1() ops etc.

- The initial 4.4BSD (or HPBSD) just invalidate PA L2 on 370:

 https://svnweb.freebsd.org/csrg/sys/hp300/hp300/locore.s?revision=41476&view=markup#l1987
---
1987 	#if defined(HP370)
1988 	ENTRY(PCIA)
1989 	        tstl    _ectype                 | got external PAC?
1990 	        jge     Lnocache6               | no, all done
1991 	        movl    #_IObase+MMUCMD,a0      | MMU control reg
1992 	        andl    #~MMU_CEN,a0@           | disable cache
1993 	        orl     #MMU_CEN,a0@            | reenable cache
1994 	Lnocache6:
1995 	        rts
1996 	#endif
---

Then it was changed to invalidate L1 D-cache on 030 later,
and DC_CLEAR to %cacr is always performed even on 020
(if 030 models are configured) at that time:

 https://svnweb.freebsd.org/csrg/sys/hp300/hp300/locore.s?annotate=45751#l2092
2092 	ENTRY(PCIA)
2093 	#if defined(HP360) || defined(HP370)
2094 	movl    #DC_CLEAR,d0
2095 	movc    d0,cacr                 | invalidate on-chip d-cache
2096 	tstl    _ectype                 | got external PAC?
2097 	jge     Lnocache6               | no, all done
2098 	MMUADDR(a0)
2099 	andl    #~MMU_CEN,a0@(MMUCMD)   | disable cache in MMU control reg
2100 	orl     #MMU_CEN,a0@(MMUCMD)    | reenable cache in MMU control reg
2101 	Lnocache6:
2102 	#endif

On the other hand, NetBSD/m68k guys tried to unify these
TLB (ATC) and cache ops among all 020/030/040/060.

The first version was implemented by leo@ in 1997:
(maybe for Atari 040/060 Milan/Falcon?):
 https://github.com/NetBSD/src/commit/6793c446f058a9442b714a20482f441a0bc31796

This version was used on atari and mvme68k etc. but hp300 and
other ports were not switched (maybe less motivation for 060?).

The second version was done by chs@ in 2002, to make more
TLB/cache ops properly inlined as much as possible:
 https://github.com/NetBSD/src/commit/40e5b8394f419d68ea594794d230d99ace0990d8#diff-5b967f9ab41c6ed689673b2ddf6158d8e0b2022068b7d0684251b2ae22993126

It looks both versions just kept the original 4.4.BSD's behavior
of PCIA(), i.e. writing DC_CLEAR to %cacr even on 020.


> 4. Ditto with PCIA_30?  The name implies "physical", which of
> course the 68030 D-cache is not.  I suppose this is there as
> a fall through for 68030 systems with external phys caches
> (if we're invalidating the external cache, we should invalidate
> the internal one, too).  Just confirming.

I guess this is because someone used PCIA() in hp300/dev/dma.c
to flush instruction cache in case of loading programs?
I.e. PCIA() was used as "flush I-cache" and there was no
meaning of "P" after that.


> 5. If so, why don't we just have a single DCIA_30?

Maybe because DCIA (this seems imply to invalidate VAC to
avoid alias) was used ~only for pmap_enter(), not in drivers?


> 6. Why on earth doesn't the m68k bus_dma invalidate the D-cache
> in the PREREAD (or POSTREAD, since the cache is write-through) case?

Probably this is because the original m68k bus_dma.c was derived
from next68k version:
 https://github.com/NetBSD/src/commit/0844eedc550c1e5dd5262c84a5fd02465347db1f
and NetBSD/next68k supported only 040 machines.

As you said, L1 D-cache on 030 should also be invalidated/flushed
on PREREAD. But I guess in most case all cached data in small
256 byte D-cache are not preserved during complex DMA setup ops.

Furthermore, I wonder how many 030 models were there that had
"generic devices with modern drivers using MI bus_dma(9)".
(I see mac68k has NuBus Sonic Ethernet, but..)

Thanks,
---
Izumi Tsutsui


Home | Main Index | Thread Index | Old Index