Subject: a perf/reliabiliy tradeoff for 1.6 vs. Sun-4c with "sw flush" cache
To: NetBSD/sparc Discussion List <port-sparc@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-sparc
Date: 09/21/2002 17:39:53
Well, my xserver process finally SIGSEGV'ed on its second test run.

This time it ran for almost 5 hours wall-clock time, and accumulated
just over 76.5 minutes of CPU time.  I seem to remember regularly
getting more than twice that much CPU time from it during regular use
under 1.3.2, but perhaps under regular use this one will last twice as
long too -- I was pushing context switching to the max, and running
hundreds of little processes with output spewing to locally run xterms,
as well as running a pair of 'ico's locally and one remotely.  I even
ran mpg123 from an rlogin session a couple of times too.  That with all
my regular X desktop too, including local xclock, swisswatch (one-second
tick), xload, etc.  The XsunMono's VSZ at the end was only 4064 with RSS
at only 2772, but this test ran as much as two orders of magnitude
longer than the same test load would without any patches.  No other
process (including smnpd, ntpd, inetd, postfix, cron, etc.) has died yet
either -- only XsunMono, which was the most stressed one.

Note of course that Manuel's original work-around of flushing the whole
cache instead was I believe 100% reliable -- just way too sloooowww! :-)

Attached are the changes I've made to pmap.c.  The '#if 0' blocks
represent the history of my tests, with comments documenting their
relative success ratio.

Mixed in here are also some splits of DIAGNOSTIC and DEBUG code to make
my DIAGNOSTIC-only kernel a wee bit leaner, but still catch anything
funny happening in this area, including one in cache.c as well, and a
couple of extra panic() checks from OpenBSD (someone should cull their
code for all the new integrity checks they've added!).

There's also a patch suggested on the OpenBSD sparc mailing list.  Alone
it didn't help, but it seems to make a heck of a lot of sense to me too
anyway, though I'm just guessing and comparing to similar Sun-4m code....

As you can see I'm just playing around with attempts to avoid flushing
the whole damn cache on every me_alloc() call.  The one extra call that
seems to be critical, though I haven't really truly tested it all alone,
is this one, which for now I've placed just before the original
cache_flush_segment() call:

			cache_flush_page(va + 63*NBPG);

Since I don't really know what I'm doing here (I don't know any of the
details of how the SPARC cache is supposed to work, and I only know the
minimal theory of how these things work in general), I'm not sure if
there are any sensitivities to order of flushes, etc.  I haven't really
wrapped my head around the relationships between PMEGs, segments,
regions, lines, pages, contexts, PTEs, etc. either (is there any
documentation anywhere with nice pictures showing all these things? :-).
Eg. isn't that magic "64" better found in CACHEINFO.c_totalsize or
something like that?

Anyway I'm going to add two more calls, and then go back to normal use
and see how it fairs -- at least it'll be faster than flushing the whole
damn cache and it should be usable for at least a full work day at a time:

			cache_flush_page(va + NBPG);
			cache_flush_page(va + 62*NBPG);

I've put a copy of the kernel (without the above two calls) here:

	ftp://ftp.weird.com/pub/woods/netbsd-sparc-1.6-VERY.6

Note it's trimmed for sun4c _only_!  The complete config file is
available on request.  Here's what it says when it boots on my diskless
SPARCstation 1+:

NetBSD 1.6 (VERY) #6: Sat Sep 21 11:58:10 EDT 2002
    woods@sometimes:/proven/work/woods/NetBSD-1.6/sys/arch/sparc/compile/VERY
total memory = 40880 KB
avail memory = 37444 KB
using 128 buffers containing 512 KB of memory
bootpath: /sbus0/le0
mainbus0 (root): Sun 4/65
cpu0 at mainbus0: MB86900/1A or L64801 @ 25 MHz, WTL3170/2 FPU
cpu0: 64K byte write-through, 16 bytes/line, sw flush: cache enabled
memreg0 at mainbus0 ioaddr 0xf4000000
clock0 at mainbus0 ioaddr 0xf2000000: mk48t02: hostid 51009dd0
timer0 at mainbus0 ioaddr 0xf3000000 ipl 10: delay constant 10
auxreg0 at mainbus0 ioaddr 0xf7400000
zs0 at mainbus0 ioaddr 0xf1000000 ipl 12 softpri 6
zstty0 at zs0 channel 0
zstty1 at zs0 channel 1
zs1 at mainbus0 ioaddr 0xf0000000 ipl 12 softpri 6
kbd0 at zs1 channel 0: baud rate 1200 (console input)
ms0 at zs1 channel 1: baud rate 1200
fdc0 at mainbus0 ioaddr 0xf7200000 ipl 11 softpri 4: chip 82072
audioamd0 at mainbus0 ioaddr 0xf7201000 ipl 13 softpri 4
audio0 at audioamd0: full duplex
sbus0 at mainbus0 ioaddr 0xf8000000: clock = 25 MHz
dma0 at sbus0 slot 0 offset 0x400000: dma rev 1
esp0 at sbus0 slot 0 offset 0x800000 level 3: ESP100, 25MHz, SCSI ID 7
scsibus0 at esp0: 8 targets, 8 luns per target
le0 at sbus0 slot 0 offset 0xc00000 level 5: address 08:00:20:08:34:2f
le0: 8 receive buffers, 2 transmit buffers
bwtwo0 at sbus0 slot 1 offset 0x0 level 7: SUNW,501-1419, 1600 x 1280 (console)
bwtwo0: attached to /dev/fb
scsibus0: waiting 2 seconds for devices to settle...
root on le0
nfs_boot: trying RARP (and RPC/bootparam)
nfs_boot: client_addr=204.92.254.3 (RARP from 204.92.254.18)
nfs_boot: server_addr=204.92.254.18
nfs_boot: hostname=very.weird.com
nfs_boot: gateway=204.92.254.6
nfs_boot: my_mask=255.255.255.0
root on sometimes.weird.com:/very
root file system type: nfs
IP Filter: v3.4.27 initialized.  Default = pass all, Logging = enabled


-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>

Index: pmap.c
===================================================================
RCS file: /cvs/master/m-NetBSD/main/syssrc/sys/arch/sparc/sparc/pmap.c,v
retrieving revision 1.207
diff -c -c -r1.207 pmap.c
*** pmap.c	11 Apr 2002 11:08:40 -0000	1.207
--- pmap.c	21 Sep 2002 21:05:09 -0000
***************
*** 1501,1509 ****
  	/* try free list first */
  	if ((me = segm_freelist.tqh_first) != NULL) {
  		TAILQ_REMOVE(&segm_freelist, me, me_list);
! #ifdef DEBUG
  		if (me->me_pmap != NULL)
  			panic("me_alloc: freelist entry has pmap");
  		if (pmapdebug & PDB_MMU_ALLOC)
  			printf("me_alloc: got pmeg %d\n", me->me_cookie);
  #endif
--- 1501,1511 ----
  	/* try free list first */
  	if ((me = segm_freelist.tqh_first) != NULL) {
  		TAILQ_REMOVE(&segm_freelist, me, me_list);
! #ifdef DIAGNOSTIC
  		if (me->me_pmap != NULL)
  			panic("me_alloc: freelist entry has pmap");
+ #endif
+ #ifdef DEBUG
  		if (pmapdebug & PDB_MMU_ALLOC)
  			printf("me_alloc: got pmeg %d\n", me->me_cookie);
  #endif
***************
*** 1533,1539 ****
  		panic("me_alloc: all pmegs gone");
  
  	pm = me->me_pmap;
! #ifdef DEBUG
  	if (pmapdebug & (PDB_MMU_ALLOC | PDB_MMU_STEAL))
  		printf("me_alloc: stealing pmeg 0x%x from pmap %p\n",
  		    me->me_cookie, pm);
--- 1535,1545 ----
  		panic("me_alloc: all pmegs gone");
  
  	pm = me->me_pmap;
! 	if (pm == NULL)                                                        
! 		panic("me_alloc: LRU entry has no pmap");                      
! 	if (pm == pmap_kernel())                                               
! 		panic("me_alloc: stealing from kernel");                       
!  #ifdef DEBUG
  	if (pmapdebug & (PDB_MMU_ALLOC | PDB_MMU_STEAL))
  		printf("me_alloc: stealing pmeg 0x%x from pmap %p\n",
  		    me->me_cookie, pm);
***************
*** 1553,1560 ****
--- 1559,1570 ----
  #endif
  
  	rp = &pm->pm_regmap[me->me_vreg];
+ 	if (rp->rg_segmap == NULL)                                             
+ 		panic("me_alloc: LRU entry's pmap has no segments");           
  	sp = &rp->rg_segmap[me->me_vseg];
  	pte = sp->sg_pte;
+ 	if (pte == NULL)                                                       
+ 		panic("me_alloc: LRU entry's pmap has no ptes");               
  
  	/*
  	 * The PMEG must be mapped into some context so that we can
***************
*** 1570,1577 ****
  	ctx = getcontext4();
  	if (CTX_USABLE(pm,rp)) {
  		CHANGE_CONTEXTS(ctx, pm->pm_ctxnum);
! 		cache_flush_segment(me->me_vreg, me->me_vseg);
! 		va = VSTOVA(me->me_vreg,me->me_vseg);
  	} else {
  		CHANGE_CONTEXTS(ctx, 0);
  		if (HASSUN4_MMU3L)
--- 1580,1639 ----
  	ctx = getcontext4();
  	if (CTX_USABLE(pm,rp)) {
  		CHANGE_CONTEXTS(ctx, pm->pm_ctxnum);
! #if defined(SUN4C)
! 		/*
! 		 * cache_flush_segment() doesn't seem to work right on non
! 		 * c_hwflush Sun4c machines.  Processes seem to get corrupted
! 		 * address space (stack?) on rare occasions and thus they crash
! 		 * in bizzare states.  This seems to happen more often the
! 		 * larger and longer running a process is.
! 		 *
! 		 * see http://mail-index.netbsd.org/port-sparc/2002/07/27/0001.html
! 		 */
! 		if (CPU_ISSUN4C && !CACHEINFO.c_hwflush) {
! 			va = VSTOVA(me->me_vreg,me->me_vseg);
! # if 0
! 			/*
! 			 * WARNING:  This is _REALLY_ VERY slow....
! 			 */
! 			for (i=0; i < 64; i++)
! 				cache_flush_page(va + i*NBPG);
! # else
! 			/* let's try boundary conditions */
! #  if 0
! 			/* this didn't work, perhaps obviously.... :-) */
! 			cache_flush_page(va);
! 			cache_flush_page(va + 64*NBPG);
! 			cache_flush_segment(me->me_vreg, me->me_vseg);
! #  else
! #   if 0
! 			/*
! 			 * this seemed to work for _much_ longer, but
! 			 * evenutally dies....
! 			 */
!  			cache_flush_page(va);
! 			cache_flush_page(va + 63*NBPG);
! 			cache_flush_segment(me->me_vreg, me->me_vseg);
! #   else
! 			/*
! 			 * this also seemed to work for _much_ longer, but also
! 			 * evenutally dies -- maybe no better than above?
! 			 */
! 			cache_flush_context();	/* seems quite fast on its own,
! 						 * but it also doesn't work all
! 						 * alone.... */
!  			cache_flush_page(va);
! 			cache_flush_page(va + 63*NBPG);
! 			cache_flush_segment(me->me_vreg, me->me_vseg);
! #   endif
! #  endif
! # endif
! 		} else
! #endif
! 		{
! 			cache_flush_segment(me->me_vreg, me->me_vseg);
! 			va = VSTOVA(me->me_vreg,me->me_vseg);
! 		}
  	} else {
  		CHANGE_CONTEXTS(ctx, 0);
  		if (HASSUN4_MMU3L)
***************
*** 1651,1656 ****
--- 1713,1720 ----
  	if (pmapdebug & PDB_MMU_ALLOC)
  		printf("me_free: freeing pmeg %d from pmap %p\n",
  		    me->me_cookie, pm);
+ #endif
+ #ifdef DIAGNOSTIC
  	if (me->me_cookie != pmeg)
  		panic("me_free: wrong mmuentry");
  	if (pm != me->me_pmap)
***************
*** 1663,1670 ****
  	if (CTX_USABLE(pm,rp)) {
  		va = VSTOVA(vr,me->me_vseg);
  	} else {
! #ifdef DEBUG
! if (getcontext4() != 0) panic("me_free: ctx != 0");
  #endif
  		if (HASSUN4_MMU3L)
  			setregmap(0, tregion);
--- 1727,1735 ----
  	if (CTX_USABLE(pm,rp)) {
  		va = VSTOVA(vr,me->me_vseg);
  	} else {
! #ifdef DIAGNOSTIC
! 		if (getcontext4() != 0)
! 			panic("me_free: ctx != 0");
  #endif
  		if (HASSUN4_MMU3L)
  			setregmap(0, tregion);
***************
*** 1729,1737 ****
  	/* try free list first */
  	if ((me = region_freelist.tqh_first) != NULL) {
  		TAILQ_REMOVE(&region_freelist, me, me_list);
! #ifdef DEBUG
  		if (me->me_pmap != NULL)
  			panic("region_alloc: freelist entry has pmap");
  		if (pmapdebug & PDB_MMUREG_ALLOC)
  			printf("region_alloc: got smeg 0x%x\n", me->me_cookie);
  #endif
--- 1794,1804 ----
  	/* try free list first */
  	if ((me = region_freelist.tqh_first) != NULL) {
  		TAILQ_REMOVE(&region_freelist, me, me_list);
! #ifdef DIAGNOSTIC
  		if (me->me_pmap != NULL)
  			panic("region_alloc: freelist entry has pmap");
+ #endif
+ #ifdef DEBUG
  		if (pmapdebug & PDB_MMUREG_ALLOC)
  			printf("region_alloc: got smeg 0x%x\n", me->me_cookie);
  #endif
***************
*** 1818,1823 ****
--- 1885,1892 ----
  	if (pmapdebug & PDB_MMUREG_ALLOC)
  		printf("region_free: freeing smeg 0x%x from pmap %p\n",
  		    me->me_cookie, pm);
+ #endif
+ #ifdef DIAGNOSTIC
  	if (me->me_cookie != smeg)
  		panic("region_free: wrong mmuentry");
  	if (pm != me->me_pmap)
***************
*** 2014,2020 ****
  		 */
  
  		setcontext4(cnum);
- 		splx(s);
  		if (doflush)
  			cache_flush_context();
  
--- 2083,2088 ----
***************
*** 2041,2046 ****
--- 2109,2115 ----
  				rp++;
  			}
  		}
+ 		splx(s); /* suggested on <URL:http://marc.theaimsgroup.com/?l=openbsd-sparc&m=101753710814521&w=2> */
  
  	} else if (CPU_ISSUN4M) {
  
Index: cache.c
===================================================================
RCS file: /cvs/master/m-NetBSD/main/syssrc/sys/arch/sparc/sparc/cache.c,v
retrieving revision 1.61
diff -c -c -r1.61 cache.c
*** cache.c	25 Jan 2002 19:19:46 -0000	1.61
--- cache.c	21 Sep 2002 02:01:28 -0000
***************
*** 423,429 ****
  	int i, ls;
  	char *p;
  
! #ifdef DEBUG
  	if (va & PGOFSET)
  		panic("cache_flush_page: asked to flush misaligned va 0x%x",va);
  #endif
--- 423,429 ----
  	int i, ls;
  	char *p;
  
! #ifdef DIAGNOSTIC
  	if (va & PGOFSET)
  		panic("cache_flush_page: asked to flush misaligned va 0x%x",va);
  #endif