Subject: Re: amd64 syscall.c question
To: None <tech-kern@netbsd.org, port-amd64@netbsd.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 06/07/2007 09:56:52
On Thu, Jun 07, 2007 at 12:55:22PM +1000, Simon Burge wrote:

> Christos Zoulas wrote:
> 
> > This can be fixed by compiling the same code twice. I think that mycroft
> > measured the overhead of plain/fancy and he decided that the split was
> > worth it.
> 
> As a data point, I converted pc532 to use syscall_{plain,fancy}() last
> year.  It speed up lmbench's "lat_syscall null" benchmark by about 20%.

On a CPU where every instruction cycle counts I can see that being true.
I was talking about modern day CPUs. :-)

Example: on a Pentium D I ran some tests. A simple syscall (either getpid
or _lwp_getprivate, I can't remember) takes about 1150 cycles to execute.
Replacing the rep; movsl in copyin() with a RISC style loop brings it down
to about 1000. Using sysenter / sysexit instead of int 0x80 / iret brings
it down to 590. Making the return path a restartable sequence and nuking
the use of cli/sti shaves off about another 50 cycles (I can't remember
the exact figure). Replacing syscall_plain with syscall_fancy makes no
measurable difference.

(As a point of interest, the Pentium 3 is much faster at making syscalls in
_real time_ than later offerings from Intel, unless you use sysenter and
sysexit, or syscall / sysret in EMT64 mode.)

We could look at moving the call to trace_exit into userret (if possible, I
think systrace may get in the way). We already have a flag for exceptional
work to be done, LW_USERRET. On the way in, we could have another flag
(LW_USERENT or something) that handles trace_enter and updating creds.

Andrew