Subject: Syscall and syscall versioning documentation for review
To: None <tech-kern@netbsd.org>
From: Pavel Cahyna <pavel@netbsd.org>
List: tech-kern
Date: 08/30/2006 03:35:33
--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hello,

I wrote some documentation about how the syscalls work, mainly from the PoV of
versioning them. Please review.

Pavel

--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=unknown-8bit
Content-Disposition: attachment; filename="chap-processes.html.diff"
Content-Transfer-Encoding: 8bit

Index: chap-processes.html
===================================================================
RCS file: /cvsroot/htdocs/Documentation/internals/en/chap-processes.html,v
retrieving revision 1.5
diff -u -r1.5 chap-processes.html
--- chap-processes.html	27 Mar 2006 14:42:56 -0000	1.5
+++ chap-processes.html	30 Aug 2006 01:26:36 -0000
@@ -110,70 +110,56 @@
       implementation in the NetBSD kernel when executing a native 32 bit ELF 
       binary on an i386 machine:</p>
 <div class="itemizedlist"><ul type="disc"><li>
-<p>
-        <code class="filename">src/sys/kern/kern_exec.c</code>: 
+<p>        <code class="filename">src/sys/kern/kern_exec.c</code>: 
         <code class="function">sys_execve</code>
 	</p>
 <div class="itemizedlist"><ul type="circle"><li>
-<p>
-          <code class="filename">src/sys/kern/kern_exec.c</code>: 
+<p>          <code class="filename">src/sys/kern/kern_exec.c</code>: 
           <code class="function">execve1</code>
 	  </p>
 <div class="itemizedlist"><ul type="square">
 <li>
-<p>
-            <code class="filename">src/sys/kern/kern_exec.c</code>: 
+<p>            <code class="filename">src/sys/kern/kern_exec.c</code>: 
             <code class="function">check_exec</code>
             </p>
 <div class="itemizedlist"><ul type="disc">
-<li>
-              <code class="filename">src/sys/kern/kern_verifiedexec.c</code>: 
+<li>              <code class="filename">src/sys/kern/kern_verifiedexec.c</code>: 
               <code class="function">veriexec_verify</code>
               </li>
 <li>
-<p>
-              <code class="filename">src/sys/kern/kern_conf.c</code>: 
+<p>              <code class="filename">src/sys/kern/kern_conf.c</code>: 
               <code class="function">*execsw[]-&gt;es_makecmds</code> 
               </p>
 <div class="itemizedlist"><ul type="circle"><li>
-<p>
-                <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<p>                <code class="filename">src/sys/kern/exec_elf32.c</code>:
                 <code class="function">exec_elf_makecmds</code>
                 </p>
 <div class="itemizedlist"><ul type="square">
-<li>
-                  <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<li>                  <code class="filename">src/sys/kern/exec_elf32.c</code>:
                   <code class="function">exec_check_header</code>
                   </li>
-<li>
-                  <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<li>                  <code class="filename">src/sys/kern/exec_elf32.c</code>:
                   <code class="function">exec_read_from</code>
                   </li>
 <li>
-<p>
-                  <code class="filename">src/sys/kern/exec_conf.c</code>:
+<p>                  <code class="filename">src/sys/kern/exec_conf.c</code>:
                   <code class="function">*execsw[]-&gt;u.elf_probe_func</code>
                   </p>
-<div class="itemizedlist"><ul type="disc"><li>
-                    <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<div class="itemizedlist"><ul type="disc"><li>                    <code class="filename">src/sys/kern/exec_elf32.c</code>:
                     <code class="function">netbsd_elf_probe</code>
                     </li></ul></div>
 </li>
-<li>
-                  <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<li>                  <code class="filename">src/sys/kern/exec_elf32.c</code>:
                   <code class="function">elf_load_psection</code>
                   </li>
-<li>
-                  <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<li>                  <code class="filename">src/sys/kern/exec_elf32.c</code>:
                   <code class="function">elf_load_file</code>
                   </li>
 <li>
-<p>
-                  <code class="filename">src/sys/kern/exec_conf.c</code>:
+<p>                  <code class="filename">src/sys/kern/exec_conf.c</code>:
                   <code class="function">*execsw[]-&gt;es_setup_stack</code>
                   </p>
-<div class="itemizedlist"><ul type="disc"><li>
-                    <code class="filename">src/sys/kern/exec_subr.c</code>:
+<div class="itemizedlist"><ul type="disc"><li>                    <code class="filename">src/sys/kern/exec_subr.c</code>:
                     <code class="function">exec_setup_stack</code>
                     </li></ul></div>
 </li>
@@ -183,97 +169,76 @@
 </ul></div>
 </li>
 <li>
-<p>
-          <code class="function">*fetch_element</code> 
+<p>          <code class="function">*fetch_element</code> 
           </p>
-<div class="itemizedlist"><ul type="disc"><li>
-	    <code class="filename">src/sys/kern/kern_exec.c</code>:
+<div class="itemizedlist"><ul type="disc"><li>	    <code class="filename">src/sys/kern/kern_exec.c</code>:
             <code class="function">execve_fetch_element</code>
             </li></ul></div>
 </li>
 <li>
-<p>
-          <code class="function">*vcp-&gt;ev_proc</code>
+<p>          <code class="function">*vcp-&gt;ev_proc</code>
           </p>
 <div class="itemizedlist"><ul type="disc">
-<li>
-	    <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li>	    <code class="filename">src/sys/kern/exec_subr.c</code>:
             <code class="function">vmcmd_map_zero</code>
             </li>
-<li>
-	    <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li>	    <code class="filename">src/sys/kern/exec_subr.c</code>:
             <code class="function">vmcmd_map_pagedvn</code>
             </li>
-<li>
-	    <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li>	    <code class="filename">src/sys/kern/exec_subr.c</code>:
             <code class="function">vmcmd_map_readvn</code>
             </li>
-<li>
-	    <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li>	    <code class="filename">src/sys/kern/exec_subr.c</code>:
             <code class="function">vmcmd_readvn</code>
             </li>
 </ul></div>
 </li>
 <li>
-<p>
-	  <code class="filename">src/sys/kern/exec_conf.c</code>:
+<p>	  <code class="filename">src/sys/kern/exec_conf.c</code>:
           <code class="function">*execsw[]-&gt;es_copyargs</code> 
           </p>
-<div class="itemizedlist"><ul type="disc"><li>
-	    <code class="filename">src/sys/kern/kern_exec.c</code>:
+<div class="itemizedlist"><ul type="disc"><li>	    <code class="filename">src/sys/kern/kern_exec.c</code>:
             <code class="function">copyargs</code>
             </li></ul></div>
 </li>
-<li>
-	  <code class="filename">src/sys/kern/kern_clock.c</code>:
+<li>	  <code class="filename">src/sys/kern/kern_clock.c</code>:
           <code class="function">stopprofclock</code>
           </li>
-<li>
-	  <code class="filename">src/sys/kern/kern_descrip.c</code>:
+<li>	  <code class="filename">src/sys/kern/kern_descrip.c</code>:
           <code class="function">fdcloseexec</code>
           </li>
-<li>
-	  <code class="filename">src/sys/kern/kern_sig.c</code>:
+<li>	  <code class="filename">src/sys/kern/kern_sig.c</code>:
           <code class="function">execsigs</code>
           </li>
-<li>
-	  <code class="filename">src/sys/kern/kern_ras.c</code>:
+<li>	  <code class="filename">src/sys/kern/kern_ras.c</code>:
           <code class="function">ras_purgeall</code>
           </li>
-<li>
-	  <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li>	  <code class="filename">src/sys/kern/exec_subr.c</code>:
           <code class="function">doexechooks</code>
           </li>
 <li>
-<p>
-	  <code class="filename">src/sys/sys/event.h</code>:
+<p>	  <code class="filename">src/sys/sys/event.h</code>:
           <code class="function">KNOTE</code>
           </p>
-<div class="itemizedlist"><ul type="disc"><li>
-	    <code class="filename">src/sys/kern/kern_event.c</code>:
+<div class="itemizedlist"><ul type="disc"><li>	    <code class="filename">src/sys/kern/kern_event.c</code>:
 	    <code class="function">knote</code>
             </li></ul></div>
 </li>
 <li>
-<p>
-	  <code class="filename">src/sys/kern/exec_conf.c</code>:
+<p>	  <code class="filename">src/sys/kern/exec_conf.c</code>:
           <code class="function">*execsw[]-&gt;es_setregs</code>
           </p>
-<div class="itemizedlist"><ul type="disc"><li>
-	    <code class="filename">src/sys/arch/i386/i386/machdep.c</code>:
+<div class="itemizedlist"><ul type="disc"><li>	    <code class="filename">src/sys/arch/i386/i386/machdep.c</code>:
             <code class="function">setregs</code>
             </li></ul></div>
 </li>
-<li>
-	  <code class="filename">src/sys/kern/kern_exec.c</code>:
+<li>	  <code class="filename">src/sys/kern/kern_exec.c</code>:
           <code class="function">exec_sigcode_map</code>
           </li>
-<li>
-	  <code class="filename">src/sys/kern/kern_exec.c</code>:
+<li>	  <code class="filename">src/sys/kern/kern_exec.c</code>:
           <code class="function">*p-&gt;p_emul-&gt;e_proc_exit</code> (NULL)
           </li>
-<li>
-	  <code class="filename">src/sys/kern/kern_exec.c</code>:
+<li>	  <code class="filename">src/sys/kern/kern_exec.c</code>:
           <code class="function">*p-&gt;p_emul-&gt;e_proc_exec</code> (NULL)
           </li>
 </ul></div>
@@ -312,7 +277,7 @@
       can find here various methods called within <code class="function">execve</code>
       code path.</p>
 <div class="table">
-<a name="id2661980"></a><p class="title"><b>Table 3.1. <span class="type">struct execsw</span> fields summary</b></p>
+<a name="id40649558"></a><p class="title"><b>Table 3.1. <span class="type">struct execsw</span> fields summary</b></p>
 <table summary="struct execsw fields summary" border="1">
 <colgroup>
 <col>
@@ -350,7 +315,7 @@
 <td><code class="varname">es_emul</code></td>
 <td>The <span class="type">struct emul</span> used for handling different
               kernel ABI. It is covered in detail in 
-             <a href="chap-processes.html#emul_switch" title="3.2.3. Multiple kernel ABI support with the emul switch">Section 3.2.3, &#8220;Multiple kernel ABI support with the emul switch&#8221;</a>.</td>
+             <a href="chap-processes.html#emul_switch" title="3.2.2. Multiple kernel ABI support with the emul switch">Section 3.2.2, &#8220;Multiple kernel ABI support with the emul switch&#8221;</a>&gt;.</td>
 </tr>
 <tr>
 <td><code class="varname">es_prio</code></td>
@@ -391,7 +356,7 @@
 <p>The <code class="function">es_makecmds</code> will fill the exec package's
       <code class="varname">ep_vmcmds</code> field with vmcmds that will be used later
       for setting up the new process virtual memory space. See 
-      <a href="chap-processes.html#vmcmds" title="3.1.3.2. Virtual memory space setup commands (vmcmds)">Section 3.1.3.2, &#8220;Virtual memory space setup commands (vmcmds)&#8221;</a> for details about the vmcmds.</p>
+      <a href="chap-processes.html#vmcmds" title="3.1.3.2. Virtual memory space setup commands (vmcmds)">Section 3.1.3.2, &#8220;Virtual memory space setup commands (vmcmds)&#8221;</a>&gt; for details about the vmcmds.</p>
 <div class="sect3" lang="en">
 <div class="titlepage"><div><div><h4 class="title">
 <a name="format_probe"></a>3.1.3.1. Executable format probe</h4></div></div></div>
@@ -419,7 +384,7 @@
 <p>Four methods are available in 
         <code class="filename">src/sys/kern/exec_subr.c</code></p>
 <div class="table">
-<a name="id2662276"></a><p class="title"><b>Table 3.2. vmcmd methods</b></p>
+<a name="id40649781"></a><p class="title"><b>Table 3.2. vmcmd methods</b></p>
 <table summary="vmcmd methods" border="1">
 <colgroup>
 <col>
@@ -543,12 +508,7 @@
 </div>
 <div class="sect2" lang="en">
 <div class="titlepage"><div><div><h3 class="title">
-<a name="libc_syscall"></a>3.2.2. System call implementation in libc</h3></div></div></div>
-<p>XXX write me</p>
-</div>
-<div class="sect2" lang="en">
-<div class="titlepage"><div><div><h3 class="title">
-<a name="emul_switch"></a>3.2.3. Multiple kernel ABI support with the emul switch</h3></div></div></div>
+<a name="emul_switch"></a>3.2.2. Multiple kernel ABI support with the emul switch</h3></div></div></div>
 <p>The <span class="type">struct emul</span> is defined in 
         <code class="filename">src/sys/sys/proc.h</code>. It defines various methods
         and parameters to handle system calls and traps. Each kernel ABI
@@ -567,10 +527,14 @@
 </div>
 <div class="sect2" lang="en">
 <div class="titlepage"><div><div><h3 class="title">
-<a name="syscalls_master"></a>3.2.4. The syscalls.master table</h3></div></div></div>
+<a name="syscalls_master"></a>3.2.3. The syscalls.master table</h3></div></div></div>
 <p>Each kernel ABI have a system call table. The table maps system
       call numbers to functions implementing the system call in the kernel
-      (e.g.: system call number 2 is <code class="function">fork</code>).
+      (e.g.: system call number 2 is <code class="function">fork</code>). The
+      convention (for native syscalls) is that the kernel function
+      implementing syscall <code class="function">foo</code>
+      is called <code class="function">sys_foo</code>. Emulation syscalls have
+      their own conventions, like linux_sys_ prefix for the Linux emulation.
       The native system call table can be found in 
       <code class="filename">src/sys/kern/syscalls.master</code>.</p>
 <p>This file is not written in C language. After any change, it
@@ -580,7 +544,7 @@
       <code class="filename">syscalls.conf</code>, and it will output several 
       files:</p>
 <div class="table">
-<a name="id2662689"></a><p class="title"><b>Table 3.3. Files produced from <code class="filename">syscalls.master</code></b></p>
+<a name="id40650184"></a><p class="title"><b>Table 3.3. Files produced from <code class="filename">syscalls.master</code></b></p>
 <table summary="Files produced from syscalls.master" border="1">
 <colgroup>
 <col>
@@ -605,7 +569,7 @@
 <tr>
 <td><code class="filename">syscall.h</code></td>
 <td>Preprocessor defines for each system call name and 
-              number</td>
+              number - used in libc</td>
 </tr>
 <tr>
 <td><code class="filename">sysent.c</code></td>
@@ -669,10 +633,206 @@
       calling thread, <em class="parameter"><code>v</code></em> is the syscallarg structure
       pointer, and <em class="parameter"><code>retval</code></em> is a pointer to the return
       value.</p>
+<p>While generating the files listed above some substitutions
+      on the function name are performed: the syscalls tagged as
+      COMPAT_XX are prefixed by compat_xx_. So the actual kernel
+      function implementing those syscalls have to be defined in a
+      corresponding way. Example: if
+      <code class="filename">syscalls.master</code> has a line
+</p>
+<pre class="programlisting">97	COMPAT_30	{ int sys_socket(int domain, int type, int protocol); }</pre>
+<p>
+	the actual syscall function will have this prototype:
+        </p>
+<div class="funcsynopsis">
+<table border="0" summary="Function synopsis" cellspacing="0" cellpadding="0" style="padding-bottom: 1em">
+<tr>
+<td><code class="funcdef">int <b class="fsfunc">compat_30_sys_socket</b>(</code></td>
+<td>
+<var class="pdparam">l</var>, </td>
+<td> </td>
+</tr>
+<tr>
+<td> </td>
+<td>
+<var class="pdparam">v</var>, </td>
+<td> </td>
+</tr>
+<tr>
+<td> </td>
+<td>
+<var class="pdparam">retval</var><code>)</code>;</td>
+<td> </td>
+</tr>
+</table>
+<table border="0" summary="Function argument synopsis" cellspacing="0" cellpadding="0">
+<tr>
+<td>struct lwp * </td>
+<td>
+<var class="pdparam">l</var>;</td>
+</tr>
+<tr>
+<td>void *  </td>
+<td>
+<var class="pdparam">v</var>;</td>
+</tr>
+<tr>
+<td>register_t * </td>
+<td>
+<var class="pdparam">retval</var>;</td>
+</tr>
+</table>
+</div>
+<p>
+	and <em class="parameter"><code>v</code></em> is a pointer to struct
+	compat_30_sys_socket_args.
+      </p>
+</div>
+<div class="sect2" lang="en">
+<div class="titlepage"><div><div><h3 class="title">
+<a name="libc_syscall"></a>3.2.4. System call implementation in libc</h3></div></div></div>
+<p>The system call implementation in libc is autogenerated
+      from the kernel implementation. The
+      <code class="filename">syscall.h</code> file contains defines which map
+      the syscall names to syscall numbers. The syscall function names are
+      changed by replacing the sys_ prefix by SYS_. By including
+      "SYS.h", we get this header file and the RSYSCALL macro, which
+      accepts the syscall name, automatically adds back the SYS_
+      prefix, takes the corresponding number, and defines a function
+      of the name given whose body is just the execution of the
+      syscall itself with the right number.  (The method of execution
+      and of giving the number and function arguments are machine
+      dependent, this is hidden in the RSYSCALL macro.) </p>
+<p>This means that e.g. the implementation of the access(2)
+      function in libc consists of an access.S file containing just:
+</p>
+<pre class="programlisting">#include "SYS.h"
+RSYSCALL(access)</pre>
+<p>
+
+      To automate this further, it is enough to add the name of this
+      file to the ASM variable in
+      <code class="filename">libc/sys/Makefile.inc</code> and the file will be
+      autogenerated with this content. </p>
+<p>This is true for libc functions which correspond exactly
+      to the kernel syscalls. It is not always the case, even if the
+      functions are found in section 2 of the manuals. For example the
+      wait, wait3 and waitpid functions are implemented as wrappers of
+      only one syscall, wait4. In such case the procedure above yields
+      the wait4 function and the wrappers can reference it as if it
+      were a normal C function. </p>
+</div>
+<div class="sect2" lang="en">
+<div class="titlepage"><div><div><h3 class="title">
+<a name="id40650381"></a>3.2.5. Versioning a system call</h3></div></div></div>
+<p>If the system call ABI (or even API) changes, it is
+    necessary to implement the old syscall with the original semantics
+    to be used by old binaries. The new version of the syscall has a
+    different syscall number, while the original one retains the old
+    number. This is called versioning.</p>
+<p>The naming conventions associated with versioning are
+    complex. If the original system call is called foo (and
+    implemented by a sys_foo function) and it is changed after the x.y
+    release, the new syscall will be named __fooxy, with the function
+    implementing it being named sys___fooxy. The original syscall
+    (left for compatibility) will be still declared as sys_foo in
+    <code class="filename">syscalls.master</code>, but will be tagged as
+    COMPAT_XY, so the function will be named compat_xy_sys_foo. We
+    will call sys_foo the original version, sys___fooxy the new
+    version and compat_xy_sys_foo the compatibility version in the
+    procedure described below.</p>
+<p>Now if the syscall is versioned again after version z.q has
+    been released, the newest version will be called __foozq. The
+    intermediate version (formerly the new version) will have to be
+    retained for compatibility, so it will be tagged as COMPAT_ZQ,
+    which will change the function name from sys___fooxy to
+    compat_zq_sys___fooxy. The oldest version compat_xy_sys_foo will
+    be unaffected by the second versioning.
+    </p>
+<p>What needs to be done:
+    </p>
+<div class="itemizedlist"><ul type="disc">
+<li>tag the old version with COMPAT_XY in
+    <code class="filename">syscalls.master</code>
+    </li>
+<li>add the new version at the end of
+    <code class="filename">syscalls.master</code> (this effectively allocates a
+    new syscall number)
+    </li>
+<li>name the new version as described above
+    </li>
+<li>tag the old version with COMPAT_XY in syscalls.master
+    </li>
+<li>implement the compatibility version, name it
+    compat_xy_sys_... as described above. The implementation belongs
+    under <code class="filename">src/sys/compat</code> and it shouldn't be a
+    modified copy of the new version, because the copies would
+    eventually diverge. Rather, it should be implemented in terms of
+    the new version, adding the adjustements needed for compatibility
+    (which means that it should behave exactly as the old version did.)
+    </li>
+<li>find all references to the old syscall function in the
+    kernel and point them to the compatibility version or to the new
+    version as appropriate. (The kernel would not link otherwise.)
+    </li>
+</ul></div>
+<p>
+    Now the kernel should be compilable and old statically linked
+    binaries should work, as should binaries using the old
+    libc. Nothing uses the new syscall yet. We have to make a new
+    libc, which will contain both the new and the compatibility
+    syscall:
+    </p>
+<div class="itemizedlist"><ul type="disc">
+<li>in <code class="filename">libc/sys/Makefile.inc</code>, replace
+    the name of the old syscall by the new syscall (__fooxy in our
+    example). When libc is rebuilt, it will contain the new function,
+    but no programs use this internal name with underscore, so it is
+    not useful yet. Also, we have lost the old name.</li>
+<li>
+<p>To make newly compiled programs use the new syscall when
+    they refer to the usual name (foo in our example), we add a
+    __RENAME(__fooxy) statement after the declaration of foo in the
+    system header file where foo is declared:
+    </p>
+<pre class="programlisting">int     foo(int, int, int)
+#if !defined(__LIBC12_SOURCE__) &amp;&amp; !defined(_STANDALONE)
+__RENAME(__fooxy)
+#endif</pre>
+<p>
+    Now, when a program is recompiled using this header, references to
+    foo will be replaced by __fooxy, except for compilation of
+    standalone tools (basically bootloaders) and libc itself. Old
+    binaries are unaware of this and continue to reference foo.
+    </p>
+</li>
+<li>To make the old binaries work with the new libc, we must
+    add the old function. We add it under
+    <code class="filename">libc/compat/sys</code>, implementing it using the
+    new function. Note that we did not use the compatibility syscall
+    in the kernal at all, so old programs will work with the new libc,
+    even if the kernel is built without COMPAT_XY. The compatibility
+    syscall is there only for the old libc, which is used if the
+    shared library was not upgraded, or internally by statically
+    linked programs. </li>
+</ul></div>
+<p>
+    We are done - we have covered the cases of old binaries, old libc and
+    new kernel (including statically linked binaries), old binaries,
+    new libc and new kernel, and new binaries, new libc and new kernel.
+    </p>
+<p>When committing, one should remember to commit the source
+    (<code class="filename">syscalls.master</code>) for the autogenerated files
+    first, and then regenerate and commit the autogenerated
+    files. They contain the RCS Id of the source file and this way,
+    the RCS Id will refer to the current source version. The assembly
+    files generated by <code class="filename">libc/sys/Makefile.inc</code> are
+    not kept in the repository at all, they are regenerated every time
+    libc is built.</p>
 </div>
 <div class="sect2" lang="en">
 <div class="titlepage"><div><div><h3 class="title">
-<a name="to64"></a>3.2.5. Managing 32 bit system calls on 64 bit systems</h3></div></div></div>
+<a name="to64"></a>3.2.6. Managing 32 bit system calls on 64 bit systems</h3></div></div></div>
 <p>When executing 32 bit binaries on a 64 bit system, care must be
       taken to only use addresses below 4 GB. This is a problem at 
       process creation, when the stack and heap are allocated, but also for

--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="chap-processes.xml.diff"

Index: chap-processes.xml
===================================================================
RCS file: /cvsroot/htdocs/Documentation/internals/en/chap-processes.xml,v
retrieving revision 1.4
diff -u -r1.4 chap-processes.xml
--- chap-processes.xml	3 Mar 2006 12:01:21 -0000	1.4
+++ chap-processes.xml	30 Aug 2006 01:33:48 -0000
@@ -565,12 +565,7 @@
       <title>Traps</title>
       <para>XXX write me</para>
     </sect2>
-
-    <sect2 id="libc_syscall">
-      <title>System call implementation in libc</title>
-      <para>XXX write me</para>
-    </sect2>
-
+    
     <sect2 id="emul_switch">
       <title>Multiple kernel ABI support with the emul switch</title>
         <para>The <type>struct emul</type> is defined in 
@@ -596,7 +591,11 @@
       <title>The syscalls.master table</title>
       <para>Each kernel ABI have a system call table. The table maps system
       call numbers to functions implementing the system call in the kernel
-      (e.g.: system call number 2 is <function>fork</function>).
+      (e.g.: system call number 2 is <function>fork</function>). The
+      convention (for native syscalls) is that the kernel function
+      implementing syscall <function>foo</function>
+      is called <function>sys_foo</function>. Emulation syscalls have
+      their own conventions, like linux_sys_ prefix for the Linux emulation.
       The native system call table can be found in 
       <filename>src/sys/kern/syscalls.master</filename>.</para>
 
@@ -633,7 +632,7 @@
             <row>
               <entry><filename>syscall.h</filename></entry>
               <entry>Preprocessor defines for each system call name and 
-              number</entry>
+              number - used in libc</entry>
             </row>
             <row>
               <entry><filename>sysent.c</filename></entry>
@@ -648,7 +647,7 @@
       <para>In order to avoid namespace collision, non native ABI have 
       <filename>syscalls.conf</filename> defining output file names prefixed
       by tags (e.g: linux_ for Linux ABI).</para>
-
+      
       <para>system call argument structures (syscallarg for short) are 
       always used to pass arguments to functions implementing the system
       calls. Each system call has its own syscallarg structure. This 
@@ -668,6 +667,194 @@
       calling thread, <parameter>v</parameter> is the syscallarg structure
       pointer, and <parameter>retval</parameter> is a pointer to the return
       value.</para>
+
+      <para>While generating the files listed above some substitutions
+      on the function name are performed: the syscalls tagged as
+      COMPAT_XX are prefixed by compat_xx_. So the actual kernel
+      function implementing those syscalls have to be defined in a
+      corresponding way. Example: if
+      <filename>syscalls.master</filename> has a line
+<programlisting>
+<![CDATA[
+97	COMPAT_30	{ int sys_socket(int domain, int type, int protocol); }
+]]>
+</programlisting>
+	the actual syscall function will have this prototype:
+        <funcsynopsis>
+          <funcprototype>
+            <funcdef>int <function>compat_30_sys_socket</function></funcdef>
+            <paramdef>struct lwp *<parameter>l</parameter></paramdef>
+            <paramdef>void * <parameter>v</parameter></paramdef>
+            <paramdef>register_t *<parameter>retval</parameter></paramdef>
+          </funcprototype>
+	</funcsynopsis>
+	and <parameter>v</parameter> is a pointer to struct
+	compat_30_sys_socket_args.
+      </para>
+
+    </sect2>
+
+    <sect2 id="libc_syscall">
+      <title>System call implementation in libc</title> 
+      <para>The system call implementation in libc is autogenerated
+      from the kernel implementation. The
+      <filename>syscall.h</filename> file contains defines which map
+      the syscall names to syscall numbers. The syscall function names are
+      changed by replacing the sys_ prefix by SYS_. By including
+      "SYS.h", we get this header file and the RSYSCALL macro, which
+      accepts the syscall name, automatically adds back the SYS_
+      prefix, takes the corresponding number, and defines a function
+      of the name given whose body is just the execution of the
+      syscall itself with the right number.  (The method of execution
+      and of giving the number and function arguments are machine
+      dependent, this is hidden in the RSYSCALL macro.) </para>
+      
+      <para>This means that e.g. the implementation of the access(2)
+      function in libc consists of an access.S file containing just:
+<programlisting>
+<![CDATA[
+#include "SYS.h"
+RSYSCALL(access)
+]]>
+</programlisting>
+
+      To automate this further, it is enough to add the name of this
+      file to the ASM variable in
+      <filename>libc/sys/Makefile.inc</filename> and the file will be
+      autogenerated with this content. </para>
+
+      <para>This is true for libc functions which correspond exactly
+      to the kernel syscalls. It is not always the case, even if the
+      functions are found in section 2 of the manuals. For example the
+      wait, wait3 and waitpid functions are implemented as wrappers of
+      only one syscall, wait4. In such case the procedure above yields
+      the wait4 function and the wrappers can reference it as if it
+      were a normal C function. </para>
+
+    </sect2>
+
+    <sect2><title>Versioning a system call</title>
+    <para>If the system call ABI (or even API) changes, it is
+    necessary to implement the old syscall with the original semantics
+    to be used by old binaries. The new version of the syscall has a
+    different syscall number, while the original one retains the old
+    number. This is called versioning.</para>
+
+    <para>The naming conventions associated with versioning are
+    complex. If the original system call is called foo (and
+    implemented by a sys_foo function) and it is changed after the x.y
+    release, the new syscall will be named __fooxy, with the function
+    implementing it being named sys___fooxy. The original syscall
+    (left for compatibility) will be still declared as sys_foo in
+    <filename>syscalls.master</filename>, but will be tagged as
+    COMPAT_XY, so the function will be named compat_xy_sys_foo. We
+    will call sys_foo the original version, sys___fooxy the new
+    version and compat_xy_sys_foo the compatibility version in the
+    procedure described below.</para>
+    <para>Now if the syscall is versioned again after version z.q has
+    been released, the newest version will be called __foozq. The
+    intermediate version (formerly the new version) will have to be
+    retained for compatibility, so it will be tagged as COMPAT_ZQ,
+    which will change the function name from sys___fooxy to
+    compat_zq_sys___fooxy. The oldest version compat_xy_sys_foo will
+    be unaffected by the second versioning.
+    </para>
+
+    <para>What needs to be done:
+    <itemizedlist>
+    <listitem>
+    <simpara>tag the old version with COMPAT_XY in
+    <filename>syscalls.master</filename>
+    </simpara>
+    </listitem>
+    <listitem>
+    <simpara>add the new version at the end of
+    <filename>syscalls.master</filename> (this effectively allocates a
+    new syscall number)
+    </simpara>
+    </listitem>
+    <listitem>
+    <simpara>name the new version as described above
+    </simpara>
+    </listitem>
+    <listitem>
+    <simpara>tag the old version with COMPAT_XY in syscalls.master
+    </simpara>
+    </listitem>
+    <listitem>
+    <simpara>implement the compatibility version, name it
+    compat_xy_sys_... as described above. The implementation belongs
+    under <filename>src/sys/compat</filename> and it shouldn't be a
+    modified copy of the new version, because the copies would
+    eventually diverge. Rather, it should be implemented in terms of
+    the new version, adding the adjustements needed for compatibility
+    (which means that it should behave exactly as the old version did.)
+    </simpara>
+    </listitem>
+    <listitem>
+    <simpara>find all references to the old syscall function in the
+    kernel and point them to the compatibility version or to the new
+    version as appropriate. (The kernel would not link otherwise.)
+    </simpara>
+    </listitem>
+    </itemizedlist>
+    Now the kernel should be compilable and old statically linked
+    binaries should work, as should binaries using the old
+    libc. Nothing uses the new syscall yet. We have to make a new
+    libc, which will contain both the new and the compatibility
+    syscall:
+    <itemizedlist>
+    <listitem>
+    <simpara>in <filename>libc/sys/Makefile.inc</filename>, replace
+    the name of the old syscall by the new syscall (__fooxy in our
+    example). When libc is rebuilt, it will contain the new function,
+    but no programs use this internal name with underscore, so it is
+    not useful yet. Also, we have lost the old name.</simpara>
+    </listitem>
+    <listitem>
+    <para>To make newly compiled programs use the new syscall when
+    they refer to the usual name (foo in our example), we add a
+    __RENAME(__fooxy) statement after the declaration of foo in the
+    system header file where foo is declared:
+    <programlisting>
+<![CDATA[
+int     foo(int, int, int)
+#if !defined(__LIBC12_SOURCE__) && !defined(_STANDALONE)
+__RENAME(__fooxy)
+#endif
+]]>
+</programlisting>
+    Now, when a program is recompiled using this header, references to
+    foo will be replaced by __fooxy, except for compilation of
+    standalone tools (basically bootloaders) and libc itself. Old
+    binaries are unaware of this and continue to reference foo.
+    </para>
+    </listitem>
+    <listitem>
+    <simpara>To make the old binaries work with the new libc, we must
+    add the old function. We add it under
+    <filename>libc/compat/sys</filename>, implementing it using the
+    new function. Note that we did not use the compatibility syscall
+    in the kernal at all, so old programs will work with the new libc,
+    even if the kernel is built without COMPAT_XY. The compatibility
+    syscall is there only for the old libc, which is used if the
+    shared library was not upgraded, or internally by statically
+    linked programs. </simpara>
+    </listitem>
+    </itemizedlist>
+    We are done - we have covered the cases of old binaries, old libc and
+    new kernel (including statically linked binaries), old binaries,
+    new libc and new kernel, and new binaries, new libc and new kernel.
+    </para>
+
+    <para>When committing, one should remember to commit the source
+    (<filename>syscalls.master</filename>) for the autogenerated files
+    first, and then regenerate and commit the autogenerated
+    files. They contain the RCS Id of the source file and this way,
+    the RCS Id will refer to the current source version. The assembly
+    files generated by <filename>libc/sys/Makefile.inc</filename> are
+    not kept in the repository at all, they are regenerated every time
+    libc is built.</para>
     </sect2>
 
     <sect2 id="to64">

--T4sUOijqQbZv57TR--