tech-kern: VIPT cache handling (Re: port-sh3/34243)

Subject: VIPT cache handling (Re: port-sh3/34243)
To: None <tech-kern@NetBSD.org>
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
List: tech-kern
Date: 09/07/2006 00:39:41
uwe@ptc.spbu.ru wrote:

> I'd be interested in hearing ideas of how to fix this properly.
> 
> PR has more information.  Essentially the symptom is that certain
> programs "hang" in infinite TLB miss on startup b/c executing their
> .init section triggers the condition.

We have to consider how virtually-indexed-physically-tagged (VIPT) cache
should be handled in both MI and MD layer.

As mentioned in the Curt Schimmel's book (ISBN 0-201-63338-8),
pure virtually-indexed cache system has two problem,
that are ambiguity and alias.
Ambiguity means "different PAs are mapped into the same VA,"
and alias means "the same PA is mapped into different VAs."
The operating system has to handle these problems by cache flush
or disabling cache against such addresses.

On the other hand, VIPT cache system uses physical address tag
to see if the cacheline is hit, so the ambiguity problem won't
happen, because cache tags never match against different PAs.
But alias still could happen on VIPT cache systems because
if the same PA is mapped into the different VAs, they could have
different virtual indexes and certainly has the same physical tags,
so multiple cacheline might have data for the single PA.

On usual VM system, virtual address space is managed per page,
so PGOFSET bits are always same between VA and PA. Then
PGOFSET bits of virtual address cache indexes are always same
if mapped VAs shares the same physicall address page.
If the number of "cachelinesize * number-of-indexes"
(or "cachesize / number-of-ways") is same with or smaller than
pagesize, virtual indexes against the same PA are always same
so alias won't happen.
The alias could happen only if the number is larger than pagesize.

On 4KB/page system, if CPU has 8KB direct mapped (== 1 way) cache,
data in single physical address could have two possible virtual indexes.
If CPU has 32KB two-way set associative cache, one PA could have
four possible virtual indexes. I don't know what this "how many
possible virtual indexes against the same PA" should be called,
but I'd call it "number of virtual cache page index" here.
On 4KB/page and 16KB direct mapped cache system (i.e. SH7750),
bit [13:12] in VA indicates the "virtual cache page index" of the VA.


The problem mentioned in this PR is caused by the method
"how we should avoid the alias problem when the same PA mapped into
 the different VAs which have different virtual cache page indexes"
on current sh3 pmap. Currently it just only allows one mapping
for each physicall address at a time regardless of its virtual cache
page indexes.

The problem happens if a page where the program running is also
mapped into different VA, and the current instruction tries to access
such VA (which has the same PA with the VA where the program running).
The access to the VA causes a fault, pmap_enter(9) is called,
and the VA where the program running will be unmapped by the pmap,
another page fault will happen as soon as the first fault is
returned, then stuck on infinite loop.

AFAIK, there is only one possible strategy to fix this situation
in MD pmap:

- allow multiple mappings for single PA if mapped VA have
  different virtual cache page indexes
 AND
- make VA pages uncached if the requested VAs to be mapped for
  single PA have different virtual cache page indexes

This is what current mips pmap does for CPUs with R4K style MMU.
My previous patch for current sh3 pmap only does the former.

Of cource, making pages uncached may cause serious performance hits,
so the MI VM system and the dynamic linker should also consider
about this alias problem on VIPT cache system.
Unfortunately there is almost nothing in them, so current mips
(and sh4) pmap has a bunch of (possibly avoidable) cache flush ops.
See pmap_copy(9) or pmap_zero(9) functions etc. in mips and sh3 pmaps.

Note this alias problem could happen even if new VA is mapped
even after the previously mapped VA is unmapped unless
cache data against the previously mapped VA is explicitly flushed.


On pure virtual cache systems, we can't avoid such
a bunch of cache flush ops, so that is weak point of it.
But on VIPT cache system, maybe we could most of
(or only a certain number of?) flush ops because
the possible virtual cache page indexes are not so large
(usually 2~4, or possibly 8).

If the MI VM (and the dynamic linker) could keep a restriction that
"single PA can be mapped into only VAs which have the same
 virtual cache page index," no cache flush ops are needed.
But maybe such restriction is not likely, so only thing
we can do is to avoid PA mappings into VAs which have
different virtual cache page indexes "as much as possible."

On the MI VM system, we could add some hint value in struct vmpage
which indicates virtual cache page index where the page was
previously mapped. If all cache data for the page is flushed
explicitly, the hint value could be "wildcard".
If we can know VA before actual mappings, we could use
this hint value to select which free vmpage should be used, and
pmap_copy(9) and pmap_zero(9) to select reserved VAs for their ops.
Chuck Silvers has some idea about this. We already have multiple
freelists (for physical cache coloring) in uvm_pagelist.c, so
it might be trivial to prepare extra buckets for each virtual page
index (and wildcard).
If we can use any VA to create shared mapping, we can just choose
appropriate VA which has the same virtual cache page index
if uvm_km_alloc(9) family is changed to take such index argument.

On the dynamic linker, it can't know the number of virtual cache
page index at runtime, maybe we have to define maximam number of it
in ABI of each CPU so that all shared mapping always could have
the same virtual cache page index.
(I'm not sure if this is really possible though)


Anyway, I'm afraid that the "proper" fixes against MI system and
ABIs can't happen for some time (or forever?), so just fixing MD pmap
(multiple VA mapping handling mentioned above) is the only way to go
for now.
---
Izumi Tsutsui