Subject: Re: 1.6 woes (pmap vs. UBC?)
To: NetBSD/sparc Discussion List <port-sparc@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: port-sparc
Date: 08/15/2002 02:33:15
[ On Wednesday, August 7, 2002 at 20:49:17 (+0200), Manuel Bouyer wrote: ]
> Subject: Re: 1.6 woes (pmap vs. UBC?)
>
> On Mon, Aug 05, 2002 at 06:59:58PM -0400, Greg A. Woods wrote:
> > Today I managed to get around to compiling a kernel with Manuel's patch.
> > 
> > My formerly speedy SS-1+ is now crawling like a snail, even just running
> > as a diskless workstation!  :-)
> 
> Yes, these cache flush ops have a high cost.

Just FYI, I remembered the other day that my XsunMono binary had last
been compiled with just '-g' (or maybe it was '-g -O' -- I don't recall
for sure and I don't see an obvious way to tell after the fact)

So I rebuilt it with '-O2' and after over 9 full days of uptime (and
about half of those days being under fairly heavy, i.e. "normal", use) I
logged out and started up the optimized Xserver again.  I also went back
to using XFS instead of NFS for my font path.

Now my SS-1+ with a full 40MB of RAM is "only" about as slow for normal
X11 work as I remember my 3/60 (with 24MB) to be (it was running
SunOS-4.1.1).  :-)  (moving the mouse between windows no longer leaves
time for a sip of coffee :-)

(though for some reason deleting a word or killing a region in emacs,
which puts the deleted text in the X cut buffer, takes extraordinarily
long -- perhaps several orders of magnitude or two longer than marking
text in an xterm window does, and perhaps the same extended amount of
time no matter whether it's one character being cut, or a whole
paragraph)

So, I'm fairly certain that the cache flush patch really does fix the
problem, or at least mask it sufficiently that it's now only a very
remote possibility, not just a rare one.  Now I'd hope it's just a
matter of finding when the cache really needs a full flushing, and
perhaps whether only a few pages need flushing so that we can get the
speed back up to normal.

I was wondering if maybe it's not the page vs. region flush that's the
problem, but instead if maybe the region isn't quite big enough (perhaps
due to some off by one error somewhere).  I don't know if I have the
lingo exactly right here, but in any case maybe if we could modify the
patch to only flush the region, as it would without any change, and also
to flush the two pages on either side of the region.  Or maybe we could
try flushing the whole cache as a single region instead of individually
flushing every page.

Then again maybe the excessive flushes are simply making up for some
other missing flush and that this one place in the code happens to be a
good enough place in the execution path to flush the stale page(s) before
it's too late.

Though I'm almost completely naive about the details of the sparc pmap,
the latter scenario makes a wee bit more sense to me given that even a
big long-running process like the Xserver only occasionally crashes --
it doesn't seem to crash every time a certain memory access pattern
occurs, though even in a relatively stable and simple system the order
of memory operations is perhaps less predictable than I'm imagining.
But still, there didn't seem to be any regular corruption happening --
it's a relatively rare event.

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>