port-i386: Re: i386 syscall: int vs. trap gate

Subject: Re: i386 syscall: int vs. trap gate
To: None <port-i386@NetBSD.ORG>
From: Charles Hannum <mycroft@deshaw.com>
List: port-i386
Date: 01/06/1996 23:45:13
In article <199601050012.TAA07745@amsterdam.lcs.mit.edu>
dm@amsterdam.lcs.mit.edu (David Mazieres) writes:

   Could someone explain to me the advantage of using int instructions
   rather that call gates for system calls?  There obviously must be one
   since NetBSD went through the trouble of switching to int.

   I thought I remembered hearing that it was faster, but at least on the
   Pentium (only documentation I have) timing for the two operations in
   cycles is:
		call 44   int  48
		retl 23   iret 27
   Thus, if you have a Pentium and you believe the intel book (two big
   ifs), the pentium would actually come out to be 8 cycles faster.

I'm pretty sure I covered this when I made the change, though not in
as much detail.  To review:

1) We have to restore the flags register when exiting the kernel, so
we simply use `iret' to exit in all cases.  This, combined with saving
and modifying the flags on entry, more than nullifies the savings you
get from the cheaper `call' instruction.  The reasons we have to save
and restore the flags are:

   a) We have to turn off the trace flag.  A call gate doesn't do this
   for us, but an interrupt gate does.

   b) The flags may change in a way that we have to defer the change
   till we exit back to user mode.  In particular:

      * We may be returning from a signal handler, which changes the
      condition flags.

      * We may be entering or exiting VM86 mode.

   c) Using a different exit sequence or frame format would be fairly
   expensive.  In some cases, we'd have to convert the frame format.
   In many cases, we'd have to duplicate code.

2) To make the frame format consistent with interrupts (as mentioned
above), the call gate is configured to automatically copy in one word.
This probably costs a cycle or two, but I don't have my manuals here
to check.  We could rearrange the frame manually, but this would be
more expensive.

3) We're required to have the interrupt gate for Linux emulation
anyway.  Since we have to support both in the kernel, the only
question is which one our native executables use.

4) The instruction to enter the interrupt gate is 2 bytes, rather than
7, so you save a little memory here and there (though not much).

In summary, using the interrupt gate is a small win, though certainly
nothing to write home about.