Subject: Re: UVM/PMAP_NEW on i386 panics ... not anymore
To: Matthias Drochner <drochner@zelux6.zel.kfa-juelich.de>
From: Stefan Grefen <grefen@hprc.tandem.com>
List: port-i386
Date: 05/25/1998 16:29:20
In message <18539.895850937@hrriss.hprc.tandem.com>  Stefan Grefen wrote:

[...] 

> 
> > Do you see performance differences?
> 
> I'll test it, I will have results after the weekend.
> I expect to see a difference in syscall performance.

Here we are. I used the Byte Unixbenchmarks and a slightly modified 
syscall benchmark (the original has no copyoutts in it, so I added 
a ioctl(0,FIONREAD,&xx) and a gettimeofday call.
Ignore the date, there seems to be problem with the RTC chip.
Gnome1 is a 386SX 40MhZ with 8MB, diskless, no swap.
(It is an Acer PC on a chip on a PC104 board).
==============================================================

My version:

  BYTE UNIX Benchmarks (Version 3.11)
  System -- NetBSD gnome1 1.3E NetBSD 1.3E (SMALL) #89: Mon May 25 14:17:24 CEST 1998 grefen@hicks:/usr/src/sys/arch/i386/compile/SMALL i386
  Start Benchmark Run: Fri May  1 02:59:16 CET 1970
   1 interactive users.
System Call Overhead Test                  2290.0 lps   (3 secs, 32 samples)
Pipe Throughput Test                       1141.1 lps   (3 secs, 32 samples)
Pipe-based Context Switching Test           515.4 lps   (3 secs, 32 samples)
Process Creation Test                        18.3 lps   (3 secs, 32 samples)
Execl Throughput Test                         4.9 lps   (2 secs, 32 samples)
System Call Overhead Test (copyout)        1243.6 lps   (3 secs, 32 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Execl Throughput Test                           16.5        4.9        0.3
Pipe-based Context Switching Test             1318.5      515.4        0.4
                                                                 =========
     SUM of  2 items                                                   0.7
     AVERAGE                                                           0.3
=================================
The stuff in the tree:
  BYTE UNIX Benchmarks (Version 3.11)
  System -- NetBSD gnome1 1.3E NetBSD 1.3E (SMALL) #88: Mon May 25 11:30:58 CEST 1998 grefen@hicks:/usr/src/sys/arch/i386/compile/SMALL i386
  Start Benchmark Run: Fri May  1 01:57:18 CET 1970
   1 interactive users.
System Call Overhead Test                  2252.5 lps   (3 secs, 32 samples)
Pipe Throughput Test                       1115.6 lps   (3 secs, 32 samples)
Pipe-based Context Switching Test           500.9 lps   (3 secs, 32 samples)
Process Creation Test                        18.8 lps   (3 secs, 32 samples)
Execl Throughput Test                         4.8 lps   (2 secs, 32 samples)
System Call Overhead Test (copyout)        1155.6 lps   (3 secs, 32 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Execl Throughput Test                           16.5        4.8        0.3
Pipe-based Context Switching Test             1318.5      500.9        0.4
                                                                 =========
     SUM of  2 items                                                   0.7
     AVERAGE                                                           0.3
-------


System Call Overhead Test  		1.6 % faster
Pipe Throughput Test 			2.2 % faster
Pipe-based Context Switching Test	2.8 % faster
Process Creation Test			2.7 % slower 
Execl Throughput Test			2.0 % faster
System Call Overhead Test (copyout)	7.6 % faster


This shows that if copyout is involved, the code in the tree adds a 
significant overhead (esp. for small amounts of data) to the copyout
function.

System Call Overhead Test 
	close(dup(0));
	getpid();
	getuid();
	umask(022);
    As there is no copyout here, this must be gained just in the loops around
    the benchmark.

Pipe Throughput Test 
	char    buf[512];
	write(pvec[1], buf, sizeof(buf));
	read(pvec[0], buf, sizeof(buf));

Pipe-based Context Switching Test
	unsigned long iter,check;
	fork();
	write(p1[1],&iter, sizeof(iter));
	read(p2[0],&check, sizeof(check));

Process Creation Test
	fork()?wait(&status):exit(0);
    No idea why this is slower (no copyout??)
    Maybe because parent is (still) COW after fork and child exit.

Execl Throughput Test

System Call Overhead Test (copyout)
	int v;
	struct timeval tv;
	ioctl(0,FIONREAD,&v);
	close(dup(0));
	gettimeofday(&tv,NULL);
	getpid();
	getuid();
	umask(022);


======

I think the ugliness of handling it in the trap is acceptable for the
performence increase. (upgrading the hard is no option for people running
on those systems).
I think the impact on a lot of programs is even bigger (more like the
copyout syscall test) as this particular piece is not exercised by the 
standard byte benchmarks. 


Stefan


> 
> Stefan
> 
> > 
> > best regards
> > Matthias
> 
> --
> Stefan Grefen                                Tandem Computers Europe Inc.
> grefen@hprc.tandem.com                       High Performance Research Center
>  --- Hacking's just another word for nothing left to kludge. ---
> 

--
Stefan Grefen                                Tandem Computers Europe Inc.
grefen@hprc.tandem.com                       High Performance Research Center
 --- Hacking's just another word for nothing left to kludge. ---