port-sparc: Re: LX vs SS20?

Subject: Re: LX vs SS20?
To: None <port-sparc@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-sparc
Date: 08/13/2003 21:55:32
I wrote stuff about a pseudo-disk driver that works on an LX but fails
on an SS20.  It gets weirder, and I'm now basically certain it's a
cache issue.

It definitely is not a problem getting the data between processes.  On
a suggestion from kre, I moved the buffer from the stack into mallocked
space (since, apparently, process A's kernel stack is not necessarily
going to be visible at the same virtual addresses when process B is
current).  That didn't help.  So I started adding more debugging
output.

Just to keep things interesting, I started seeing the data coming
through OK - sometimes.  Sometimes all 0x200 bytes came through,
sometimes 0x80 bytes, sometimes 0x100 bytes.  The block of 0x200 bytes
seems to be broken up into four blocks of 0x80 bytes each, and each
block either comes through intact or is replaced by zeroes.  (Why 128?
I have no idea.  The largest cache line is 64 bytes, and even that is
only the Icache; a Dcache line is 32 bytes.)  Of the 16 possible
right/wrong patterns, I've seen 13 after a hundred or so trials.  What
data does come through always seems to come through correctly.

I decided to try to see whether this were truly a cache problem.  I
built a kernel in which viking_cache_enable() was empty, in an attempt
to leave the cache disabled.  This version fell over quite hard; as
soon as it tried to access the disk, I started getting "!TC on DATA
XFER" errors, repeating, it did not progress.  Upon rebooting, I found
it had managed to destroy /dev's inode(!).  After putting the system
back together, I tried a version that called
srmmu_vcache_flush_context(), even though the SS20 appears to be a
Viking and thus is not supposed to need cache flushes (module_viking
has most of the cache functions nooped out).  This version simply
panicked as soon as it tried to flush the cache.  So I did it the hard
way: I allocated a megabyte of RAM and, when I wanted to flush the
cache, wrote every 16th byte of that megabyte.  (A megabyte because the
machine has not only internal Icache and Dcache, but also a megabyte of
"external" cache, according to the boot-time messages.)  When I turn on
calls to this routine (something I arranged to be able to do at run
time), this version works, apparently perfectly, though of course at a
performance price.

The `missing' cache flush - the one I'm making up for with my own
do-it-the-hard-way flush - appears to be somewhere in the physio()
machinery.  I printed out the data pointed to by b_data just before
exiting the strategy routine, and everything's there, always.  But
after physio() returns, in my raw-device read routine, I checked with
copyin(), and the bits don't make it to userland - unless I call my
cache-flusher (which, when I do it, I do after writing the data in the
strategy routine).

Thoughts?  I'm going to be investigating further, but any thoughts you
may have on the matter would be most welcome.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B