Subject: Re: In-kernel RAS
To: Andrew Doran <ad@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: tech-kern
Date: 11/12/2007 21:44:03
On Mon, Nov 12, 2007 at 03:07:13PM +0000, Andrew Doran wrote:
> On Mon, Nov 12, 2007 at 03:15:41PM +0100, Joerg Sonnenberger wrote:
> 
> > currently the return to userland code on x86 is quite expensive because
> > the AST check has to use cli/sti which are not cheap. I've been
> > wondering whether we could use RAS to handle this case better and other
> > situations as well.
> 
> I tried it with a RAS and it didn't save much. I can't remember exactly but
> it was somewhere around 30 clock cycles on a P4. It's not actually a big
> deal because cli is followed by iret/sysexit/sysret which are serializing
> and mask cli's presence.
> 
> The big win is using sysenter/sysexit. Doing that requires a 4GB segment
> limit which breaks our non-executable stack protection. The NX bit can
> be used, but it's only available on newer processors and requires that
> the pmap uses the multi-level PAE pagetable format like amd64.

Related to syscall times is the fact that the libc stubs contain a
branch that the P4 (but not athlon) will mispredict.
There might be a measurable gain from making the cpu default to correctly
predincting the branch.
Specifically all the i386 and amd64 syscalls do a 'jc 2b' to test the
carry flag (to set errno) on syscall return.  Since this is a backwards
jump the P4 will predict it as taken (unless the branch in the prediction
tables) and suffer a pipeline stall - which is (IIRC) at least 14 cycles!

I can't test this since all by systems are AMD.  It also needs to be
checked in some real app where the branch target cache won't contain
info for the syscall being cheched.

	David

-- 
David Laight: david@l8s.co.uk