Subject: Re: "pmap_unwire: wiring ... didn't change!"
To: Markus W Kilbinger <kilbi@rad.rwth-aachen.de>
From: Chuck Silvers <chuq@chuq.com>
List: port-cobalt
Date: 02/14/2005 09:41:58
On Sun, Feb 13, 2005 at 09:16:29PM +0100, Markus W Kilbinger wrote:
> >>>>> "Chuck" == Chuck Silvers <chuq@chuq.com> writes:
> 
>     >> I've applied your patch and I can confirm vanishing of the
>     >> "pmap_unwire: ..." messages (so far, 2 hours now).
> 
>     Chuck> cool.
> 
> (... 9.5 hours now ;-))
> 
>     >> But I still see (my?) data corruption problem:
> 
>     Chuck> that sounds like a CPU cache problem to me too, probably in
>     Chuck> bus_dma or the cache-flushing code itself. if it's
>     Chuck> happening during writes to disk rather than reads from disk
>     Chuck> then it's probably in the cache write-back part rather than
>     Chuck> the cache invalidate part. I didn't see anything in a brief
>     Chuck> look at the code, though.
> 
> My mentioned tests, where I can reproduce the data corruption
> certainly, involve disk access; _reading_ large data amounts from disk
> is enough to get a corruption.

so you get different corruption when you read the same file at different times?
that's useful to know.


> Once I tested my qube2's RAM with pkgsrc/sysutils/memtester where no
> errors were reported.
> 
> I did not notice any data corruption if using my qube2 in routing
> data, but I did no selective stress testing on that.

network routing probably doesn't access the packet data from the CPU as much,
it would mostly be DMA.  or the bug could be in part of the code that the
network doesn't end up using.


>     Izumi> I guess there is something wrong around r5k cache code
>     Izumi> but I can't find any particular problem in cache_r5k.c
>     Izumi> when I looked at (but I could be wrong).
> 
> Hmm, if the problems occurs on quite different hardware, just having
> the same mips CPU type, (common) r5k cache handling seems really to be
> the most probable cause of the corruption (correct?). Or ist bus_dma
> still a candidate?

could be either, we don't know yet.  the various versions of the bus_dma code
for all the MIPS3 platforms are pretty similar.


FYI, I'm probably not going to have time to pursue this cache problem soon,
so hopefully one of the other MIPS guys can run with it.

-Chuck