Subject: Syscall and syscall versioning documentation for review
To: None <tech-kern@netbsd.org>
From: Pavel Cahyna <pavel@netbsd.org>
List: tech-kern
Date: 08/30/2006 03:35:33
--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Hello,
I wrote some documentation about how the syscalls work, mainly from the PoV of
versioning them. Please review.
Pavel
--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=unknown-8bit
Content-Disposition: attachment; filename="chap-processes.html.diff"
Content-Transfer-Encoding: 8bit
Index: chap-processes.html
===================================================================
RCS file: /cvsroot/htdocs/Documentation/internals/en/chap-processes.html,v
retrieving revision 1.5
diff -u -r1.5 chap-processes.html
--- chap-processes.html 27 Mar 2006 14:42:56 -0000 1.5
+++ chap-processes.html 30 Aug 2006 01:26:36 -0000
@@ -110,70 +110,56 @@
implementation in the NetBSD kernel when executing a native 32 bit ELF
binary on an i386 machine:</p>
<div class="itemizedlist"><ul type="disc"><li>
-<p>
- <code class="filename">src/sys/kern/kern_exec.c</code>:
+<p> <code class="filename">src/sys/kern/kern_exec.c</code>:
<code class="function">sys_execve</code>
</p>
<div class="itemizedlist"><ul type="circle"><li>
-<p>
- <code class="filename">src/sys/kern/kern_exec.c</code>:
+<p> <code class="filename">src/sys/kern/kern_exec.c</code>:
<code class="function">execve1</code>
</p>
<div class="itemizedlist"><ul type="square">
<li>
-<p>
- <code class="filename">src/sys/kern/kern_exec.c</code>:
+<p> <code class="filename">src/sys/kern/kern_exec.c</code>:
<code class="function">check_exec</code>
</p>
<div class="itemizedlist"><ul type="disc">
-<li>
- <code class="filename">src/sys/kern/kern_verifiedexec.c</code>:
+<li> <code class="filename">src/sys/kern/kern_verifiedexec.c</code>:
<code class="function">veriexec_verify</code>
</li>
<li>
-<p>
- <code class="filename">src/sys/kern/kern_conf.c</code>:
+<p> <code class="filename">src/sys/kern/kern_conf.c</code>:
<code class="function">*execsw[]->es_makecmds</code>
</p>
<div class="itemizedlist"><ul type="circle"><li>
-<p>
- <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<p> <code class="filename">src/sys/kern/exec_elf32.c</code>:
<code class="function">exec_elf_makecmds</code>
</p>
<div class="itemizedlist"><ul type="square">
-<li>
- <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<li> <code class="filename">src/sys/kern/exec_elf32.c</code>:
<code class="function">exec_check_header</code>
</li>
-<li>
- <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<li> <code class="filename">src/sys/kern/exec_elf32.c</code>:
<code class="function">exec_read_from</code>
</li>
<li>
-<p>
- <code class="filename">src/sys/kern/exec_conf.c</code>:
+<p> <code class="filename">src/sys/kern/exec_conf.c</code>:
<code class="function">*execsw[]->u.elf_probe_func</code>
</p>
-<div class="itemizedlist"><ul type="disc"><li>
- <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<div class="itemizedlist"><ul type="disc"><li> <code class="filename">src/sys/kern/exec_elf32.c</code>:
<code class="function">netbsd_elf_probe</code>
</li></ul></div>
</li>
-<li>
- <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<li> <code class="filename">src/sys/kern/exec_elf32.c</code>:
<code class="function">elf_load_psection</code>
</li>
-<li>
- <code class="filename">src/sys/kern/exec_elf32.c</code>:
+<li> <code class="filename">src/sys/kern/exec_elf32.c</code>:
<code class="function">elf_load_file</code>
</li>
<li>
-<p>
- <code class="filename">src/sys/kern/exec_conf.c</code>:
+<p> <code class="filename">src/sys/kern/exec_conf.c</code>:
<code class="function">*execsw[]->es_setup_stack</code>
</p>
-<div class="itemizedlist"><ul type="disc"><li>
- <code class="filename">src/sys/kern/exec_subr.c</code>:
+<div class="itemizedlist"><ul type="disc"><li> <code class="filename">src/sys/kern/exec_subr.c</code>:
<code class="function">exec_setup_stack</code>
</li></ul></div>
</li>
@@ -183,97 +169,76 @@
</ul></div>
</li>
<li>
-<p>
- <code class="function">*fetch_element</code>
+<p> <code class="function">*fetch_element</code>
</p>
-<div class="itemizedlist"><ul type="disc"><li>
- <code class="filename">src/sys/kern/kern_exec.c</code>:
+<div class="itemizedlist"><ul type="disc"><li> <code class="filename">src/sys/kern/kern_exec.c</code>:
<code class="function">execve_fetch_element</code>
</li></ul></div>
</li>
<li>
-<p>
- <code class="function">*vcp->ev_proc</code>
+<p> <code class="function">*vcp->ev_proc</code>
</p>
<div class="itemizedlist"><ul type="disc">
-<li>
- <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li> <code class="filename">src/sys/kern/exec_subr.c</code>:
<code class="function">vmcmd_map_zero</code>
</li>
-<li>
- <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li> <code class="filename">src/sys/kern/exec_subr.c</code>:
<code class="function">vmcmd_map_pagedvn</code>
</li>
-<li>
- <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li> <code class="filename">src/sys/kern/exec_subr.c</code>:
<code class="function">vmcmd_map_readvn</code>
</li>
-<li>
- <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li> <code class="filename">src/sys/kern/exec_subr.c</code>:
<code class="function">vmcmd_readvn</code>
</li>
</ul></div>
</li>
<li>
-<p>
- <code class="filename">src/sys/kern/exec_conf.c</code>:
+<p> <code class="filename">src/sys/kern/exec_conf.c</code>:
<code class="function">*execsw[]->es_copyargs</code>
</p>
-<div class="itemizedlist"><ul type="disc"><li>
- <code class="filename">src/sys/kern/kern_exec.c</code>:
+<div class="itemizedlist"><ul type="disc"><li> <code class="filename">src/sys/kern/kern_exec.c</code>:
<code class="function">copyargs</code>
</li></ul></div>
</li>
-<li>
- <code class="filename">src/sys/kern/kern_clock.c</code>:
+<li> <code class="filename">src/sys/kern/kern_clock.c</code>:
<code class="function">stopprofclock</code>
</li>
-<li>
- <code class="filename">src/sys/kern/kern_descrip.c</code>:
+<li> <code class="filename">src/sys/kern/kern_descrip.c</code>:
<code class="function">fdcloseexec</code>
</li>
-<li>
- <code class="filename">src/sys/kern/kern_sig.c</code>:
+<li> <code class="filename">src/sys/kern/kern_sig.c</code>:
<code class="function">execsigs</code>
</li>
-<li>
- <code class="filename">src/sys/kern/kern_ras.c</code>:
+<li> <code class="filename">src/sys/kern/kern_ras.c</code>:
<code class="function">ras_purgeall</code>
</li>
-<li>
- <code class="filename">src/sys/kern/exec_subr.c</code>:
+<li> <code class="filename">src/sys/kern/exec_subr.c</code>:
<code class="function">doexechooks</code>
</li>
<li>
-<p>
- <code class="filename">src/sys/sys/event.h</code>:
+<p> <code class="filename">src/sys/sys/event.h</code>:
<code class="function">KNOTE</code>
</p>
-<div class="itemizedlist"><ul type="disc"><li>
- <code class="filename">src/sys/kern/kern_event.c</code>:
+<div class="itemizedlist"><ul type="disc"><li> <code class="filename">src/sys/kern/kern_event.c</code>:
<code class="function">knote</code>
</li></ul></div>
</li>
<li>
-<p>
- <code class="filename">src/sys/kern/exec_conf.c</code>:
+<p> <code class="filename">src/sys/kern/exec_conf.c</code>:
<code class="function">*execsw[]->es_setregs</code>
</p>
-<div class="itemizedlist"><ul type="disc"><li>
- <code class="filename">src/sys/arch/i386/i386/machdep.c</code>:
+<div class="itemizedlist"><ul type="disc"><li> <code class="filename">src/sys/arch/i386/i386/machdep.c</code>:
<code class="function">setregs</code>
</li></ul></div>
</li>
-<li>
- <code class="filename">src/sys/kern/kern_exec.c</code>:
+<li> <code class="filename">src/sys/kern/kern_exec.c</code>:
<code class="function">exec_sigcode_map</code>
</li>
-<li>
- <code class="filename">src/sys/kern/kern_exec.c</code>:
+<li> <code class="filename">src/sys/kern/kern_exec.c</code>:
<code class="function">*p->p_emul->e_proc_exit</code> (NULL)
</li>
-<li>
- <code class="filename">src/sys/kern/kern_exec.c</code>:
+<li> <code class="filename">src/sys/kern/kern_exec.c</code>:
<code class="function">*p->p_emul->e_proc_exec</code> (NULL)
</li>
</ul></div>
@@ -312,7 +277,7 @@
can find here various methods called within <code class="function">execve</code>
code path.</p>
<div class="table">
-<a name="id2661980"></a><p class="title"><b>Table 3.1. <span class="type">struct execsw</span> fields summary</b></p>
+<a name="id40649558"></a><p class="title"><b>Table 3.1. <span class="type">struct execsw</span> fields summary</b></p>
<table summary="struct execsw fields summary" border="1">
<colgroup>
<col>
@@ -350,7 +315,7 @@
<td><code class="varname">es_emul</code></td>
<td>The <span class="type">struct emul</span> used for handling different
kernel ABI. It is covered in detail in
- <a href="chap-processes.html#emul_switch" title="3.2.3. Multiple kernel ABI support with the emul switch">Section 3.2.3, “Multiple kernel ABI support with the emul switch”</a>.</td>
+ <a href="chap-processes.html#emul_switch" title="3.2.2. Multiple kernel ABI support with the emul switch">Section 3.2.2, “Multiple kernel ABI support with the emul switch”</a>>.</td>
</tr>
<tr>
<td><code class="varname">es_prio</code></td>
@@ -391,7 +356,7 @@
<p>The <code class="function">es_makecmds</code> will fill the exec package's
<code class="varname">ep_vmcmds</code> field with vmcmds that will be used later
for setting up the new process virtual memory space. See
- <a href="chap-processes.html#vmcmds" title="3.1.3.2. Virtual memory space setup commands (vmcmds)">Section 3.1.3.2, “Virtual memory space setup commands (vmcmds)”</a> for details about the vmcmds.</p>
+ <a href="chap-processes.html#vmcmds" title="3.1.3.2. Virtual memory space setup commands (vmcmds)">Section 3.1.3.2, “Virtual memory space setup commands (vmcmds)”</a>> for details about the vmcmds.</p>
<div class="sect3" lang="en">
<div class="titlepage"><div><div><h4 class="title">
<a name="format_probe"></a>3.1.3.1. Executable format probe</h4></div></div></div>
@@ -419,7 +384,7 @@
<p>Four methods are available in
<code class="filename">src/sys/kern/exec_subr.c</code></p>
<div class="table">
-<a name="id2662276"></a><p class="title"><b>Table 3.2. vmcmd methods</b></p>
+<a name="id40649781"></a><p class="title"><b>Table 3.2. vmcmd methods</b></p>
<table summary="vmcmd methods" border="1">
<colgroup>
<col>
@@ -543,12 +508,7 @@
</div>
<div class="sect2" lang="en">
<div class="titlepage"><div><div><h3 class="title">
-<a name="libc_syscall"></a>3.2.2. System call implementation in libc</h3></div></div></div>
-<p>XXX write me</p>
-</div>
-<div class="sect2" lang="en">
-<div class="titlepage"><div><div><h3 class="title">
-<a name="emul_switch"></a>3.2.3. Multiple kernel ABI support with the emul switch</h3></div></div></div>
+<a name="emul_switch"></a>3.2.2. Multiple kernel ABI support with the emul switch</h3></div></div></div>
<p>The <span class="type">struct emul</span> is defined in
<code class="filename">src/sys/sys/proc.h</code>. It defines various methods
and parameters to handle system calls and traps. Each kernel ABI
@@ -567,10 +527,14 @@
</div>
<div class="sect2" lang="en">
<div class="titlepage"><div><div><h3 class="title">
-<a name="syscalls_master"></a>3.2.4. The syscalls.master table</h3></div></div></div>
+<a name="syscalls_master"></a>3.2.3. The syscalls.master table</h3></div></div></div>
<p>Each kernel ABI have a system call table. The table maps system
call numbers to functions implementing the system call in the kernel
- (e.g.: system call number 2 is <code class="function">fork</code>).
+ (e.g.: system call number 2 is <code class="function">fork</code>). The
+ convention (for native syscalls) is that the kernel function
+ implementing syscall <code class="function">foo</code>
+ is called <code class="function">sys_foo</code>. Emulation syscalls have
+ their own conventions, like linux_sys_ prefix for the Linux emulation.
The native system call table can be found in
<code class="filename">src/sys/kern/syscalls.master</code>.</p>
<p>This file is not written in C language. After any change, it
@@ -580,7 +544,7 @@
<code class="filename">syscalls.conf</code>, and it will output several
files:</p>
<div class="table">
-<a name="id2662689"></a><p class="title"><b>Table 3.3. Files produced from <code class="filename">syscalls.master</code></b></p>
+<a name="id40650184"></a><p class="title"><b>Table 3.3. Files produced from <code class="filename">syscalls.master</code></b></p>
<table summary="Files produced from syscalls.master" border="1">
<colgroup>
<col>
@@ -605,7 +569,7 @@
<tr>
<td><code class="filename">syscall.h</code></td>
<td>Preprocessor defines for each system call name and
- number</td>
+ number - used in libc</td>
</tr>
<tr>
<td><code class="filename">sysent.c</code></td>
@@ -669,10 +633,206 @@
calling thread, <em class="parameter"><code>v</code></em> is the syscallarg structure
pointer, and <em class="parameter"><code>retval</code></em> is a pointer to the return
value.</p>
+<p>While generating the files listed above some substitutions
+ on the function name are performed: the syscalls tagged as
+ COMPAT_XX are prefixed by compat_xx_. So the actual kernel
+ function implementing those syscalls have to be defined in a
+ corresponding way. Example: if
+ <code class="filename">syscalls.master</code> has a line
+</p>
+<pre class="programlisting">97 COMPAT_30 { int sys_socket(int domain, int type, int protocol); }</pre>
+<p>
+ the actual syscall function will have this prototype:
+ </p>
+<div class="funcsynopsis">
+<table border="0" summary="Function synopsis" cellspacing="0" cellpadding="0" style="padding-bottom: 1em">
+<tr>
+<td><code class="funcdef">int <b class="fsfunc">compat_30_sys_socket</b>(</code></td>
+<td>
+<var class="pdparam">l</var>, </td>
+<td> </td>
+</tr>
+<tr>
+<td> </td>
+<td>
+<var class="pdparam">v</var>, </td>
+<td> </td>
+</tr>
+<tr>
+<td> </td>
+<td>
+<var class="pdparam">retval</var><code>)</code>;</td>
+<td> </td>
+</tr>
+</table>
+<table border="0" summary="Function argument synopsis" cellspacing="0" cellpadding="0">
+<tr>
+<td>struct lwp * </td>
+<td>
+<var class="pdparam">l</var>;</td>
+</tr>
+<tr>
+<td>void * </td>
+<td>
+<var class="pdparam">v</var>;</td>
+</tr>
+<tr>
+<td>register_t * </td>
+<td>
+<var class="pdparam">retval</var>;</td>
+</tr>
+</table>
+</div>
+<p>
+ and <em class="parameter"><code>v</code></em> is a pointer to struct
+ compat_30_sys_socket_args.
+ </p>
+</div>
+<div class="sect2" lang="en">
+<div class="titlepage"><div><div><h3 class="title">
+<a name="libc_syscall"></a>3.2.4. System call implementation in libc</h3></div></div></div>
+<p>The system call implementation in libc is autogenerated
+ from the kernel implementation. The
+ <code class="filename">syscall.h</code> file contains defines which map
+ the syscall names to syscall numbers. The syscall function names are
+ changed by replacing the sys_ prefix by SYS_. By including
+ "SYS.h", we get this header file and the RSYSCALL macro, which
+ accepts the syscall name, automatically adds back the SYS_
+ prefix, takes the corresponding number, and defines a function
+ of the name given whose body is just the execution of the
+ syscall itself with the right number. (The method of execution
+ and of giving the number and function arguments are machine
+ dependent, this is hidden in the RSYSCALL macro.) </p>
+<p>This means that e.g. the implementation of the access(2)
+ function in libc consists of an access.S file containing just:
+</p>
+<pre class="programlisting">#include "SYS.h"
+RSYSCALL(access)</pre>
+<p>
+
+ To automate this further, it is enough to add the name of this
+ file to the ASM variable in
+ <code class="filename">libc/sys/Makefile.inc</code> and the file will be
+ autogenerated with this content. </p>
+<p>This is true for libc functions which correspond exactly
+ to the kernel syscalls. It is not always the case, even if the
+ functions are found in section 2 of the manuals. For example the
+ wait, wait3 and waitpid functions are implemented as wrappers of
+ only one syscall, wait4. In such case the procedure above yields
+ the wait4 function and the wrappers can reference it as if it
+ were a normal C function. </p>
+</div>
+<div class="sect2" lang="en">
+<div class="titlepage"><div><div><h3 class="title">
+<a name="id40650381"></a>3.2.5. Versioning a system call</h3></div></div></div>
+<p>If the system call ABI (or even API) changes, it is
+ necessary to implement the old syscall with the original semantics
+ to be used by old binaries. The new version of the syscall has a
+ different syscall number, while the original one retains the old
+ number. This is called versioning.</p>
+<p>The naming conventions associated with versioning are
+ complex. If the original system call is called foo (and
+ implemented by a sys_foo function) and it is changed after the x.y
+ release, the new syscall will be named __fooxy, with the function
+ implementing it being named sys___fooxy. The original syscall
+ (left for compatibility) will be still declared as sys_foo in
+ <code class="filename">syscalls.master</code>, but will be tagged as
+ COMPAT_XY, so the function will be named compat_xy_sys_foo. We
+ will call sys_foo the original version, sys___fooxy the new
+ version and compat_xy_sys_foo the compatibility version in the
+ procedure described below.</p>
+<p>Now if the syscall is versioned again after version z.q has
+ been released, the newest version will be called __foozq. The
+ intermediate version (formerly the new version) will have to be
+ retained for compatibility, so it will be tagged as COMPAT_ZQ,
+ which will change the function name from sys___fooxy to
+ compat_zq_sys___fooxy. The oldest version compat_xy_sys_foo will
+ be unaffected by the second versioning.
+ </p>
+<p>What needs to be done:
+ </p>
+<div class="itemizedlist"><ul type="disc">
+<li>tag the old version with COMPAT_XY in
+ <code class="filename">syscalls.master</code>
+ </li>
+<li>add the new version at the end of
+ <code class="filename">syscalls.master</code> (this effectively allocates a
+ new syscall number)
+ </li>
+<li>name the new version as described above
+ </li>
+<li>tag the old version with COMPAT_XY in syscalls.master
+ </li>
+<li>implement the compatibility version, name it
+ compat_xy_sys_... as described above. The implementation belongs
+ under <code class="filename">src/sys/compat</code> and it shouldn't be a
+ modified copy of the new version, because the copies would
+ eventually diverge. Rather, it should be implemented in terms of
+ the new version, adding the adjustements needed for compatibility
+ (which means that it should behave exactly as the old version did.)
+ </li>
+<li>find all references to the old syscall function in the
+ kernel and point them to the compatibility version or to the new
+ version as appropriate. (The kernel would not link otherwise.)
+ </li>
+</ul></div>
+<p>
+ Now the kernel should be compilable and old statically linked
+ binaries should work, as should binaries using the old
+ libc. Nothing uses the new syscall yet. We have to make a new
+ libc, which will contain both the new and the compatibility
+ syscall:
+ </p>
+<div class="itemizedlist"><ul type="disc">
+<li>in <code class="filename">libc/sys/Makefile.inc</code>, replace
+ the name of the old syscall by the new syscall (__fooxy in our
+ example). When libc is rebuilt, it will contain the new function,
+ but no programs use this internal name with underscore, so it is
+ not useful yet. Also, we have lost the old name.</li>
+<li>
+<p>To make newly compiled programs use the new syscall when
+ they refer to the usual name (foo in our example), we add a
+ __RENAME(__fooxy) statement after the declaration of foo in the
+ system header file where foo is declared:
+ </p>
+<pre class="programlisting">int foo(int, int, int)
+#if !defined(__LIBC12_SOURCE__) && !defined(_STANDALONE)
+__RENAME(__fooxy)
+#endif</pre>
+<p>
+ Now, when a program is recompiled using this header, references to
+ foo will be replaced by __fooxy, except for compilation of
+ standalone tools (basically bootloaders) and libc itself. Old
+ binaries are unaware of this and continue to reference foo.
+ </p>
+</li>
+<li>To make the old binaries work with the new libc, we must
+ add the old function. We add it under
+ <code class="filename">libc/compat/sys</code>, implementing it using the
+ new function. Note that we did not use the compatibility syscall
+ in the kernal at all, so old programs will work with the new libc,
+ even if the kernel is built without COMPAT_XY. The compatibility
+ syscall is there only for the old libc, which is used if the
+ shared library was not upgraded, or internally by statically
+ linked programs. </li>
+</ul></div>
+<p>
+ We are done - we have covered the cases of old binaries, old libc and
+ new kernel (including statically linked binaries), old binaries,
+ new libc and new kernel, and new binaries, new libc and new kernel.
+ </p>
+<p>When committing, one should remember to commit the source
+ (<code class="filename">syscalls.master</code>) for the autogenerated files
+ first, and then regenerate and commit the autogenerated
+ files. They contain the RCS Id of the source file and this way,
+ the RCS Id will refer to the current source version. The assembly
+ files generated by <code class="filename">libc/sys/Makefile.inc</code> are
+ not kept in the repository at all, they are regenerated every time
+ libc is built.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage"><div><div><h3 class="title">
-<a name="to64"></a>3.2.5. Managing 32 bit system calls on 64 bit systems</h3></div></div></div>
+<a name="to64"></a>3.2.6. Managing 32 bit system calls on 64 bit systems</h3></div></div></div>
<p>When executing 32 bit binaries on a 64 bit system, care must be
taken to only use addresses below 4 GB. This is a problem at
process creation, when the stack and heap are allocated, but also for
--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="chap-processes.xml.diff"
Index: chap-processes.xml
===================================================================
RCS file: /cvsroot/htdocs/Documentation/internals/en/chap-processes.xml,v
retrieving revision 1.4
diff -u -r1.4 chap-processes.xml
--- chap-processes.xml 3 Mar 2006 12:01:21 -0000 1.4
+++ chap-processes.xml 30 Aug 2006 01:33:48 -0000
@@ -565,12 +565,7 @@
<title>Traps</title>
<para>XXX write me</para>
</sect2>
-
- <sect2 id="libc_syscall">
- <title>System call implementation in libc</title>
- <para>XXX write me</para>
- </sect2>
-
+
<sect2 id="emul_switch">
<title>Multiple kernel ABI support with the emul switch</title>
<para>The <type>struct emul</type> is defined in
@@ -596,7 +591,11 @@
<title>The syscalls.master table</title>
<para>Each kernel ABI have a system call table. The table maps system
call numbers to functions implementing the system call in the kernel
- (e.g.: system call number 2 is <function>fork</function>).
+ (e.g.: system call number 2 is <function>fork</function>). The
+ convention (for native syscalls) is that the kernel function
+ implementing syscall <function>foo</function>
+ is called <function>sys_foo</function>. Emulation syscalls have
+ their own conventions, like linux_sys_ prefix for the Linux emulation.
The native system call table can be found in
<filename>src/sys/kern/syscalls.master</filename>.</para>
@@ -633,7 +632,7 @@
<row>
<entry><filename>syscall.h</filename></entry>
<entry>Preprocessor defines for each system call name and
- number</entry>
+ number - used in libc</entry>
</row>
<row>
<entry><filename>sysent.c</filename></entry>
@@ -648,7 +647,7 @@
<para>In order to avoid namespace collision, non native ABI have
<filename>syscalls.conf</filename> defining output file names prefixed
by tags (e.g: linux_ for Linux ABI).</para>
-
+
<para>system call argument structures (syscallarg for short) are
always used to pass arguments to functions implementing the system
calls. Each system call has its own syscallarg structure. This
@@ -668,6 +667,194 @@
calling thread, <parameter>v</parameter> is the syscallarg structure
pointer, and <parameter>retval</parameter> is a pointer to the return
value.</para>
+
+ <para>While generating the files listed above some substitutions
+ on the function name are performed: the syscalls tagged as
+ COMPAT_XX are prefixed by compat_xx_. So the actual kernel
+ function implementing those syscalls have to be defined in a
+ corresponding way. Example: if
+ <filename>syscalls.master</filename> has a line
+<programlisting>
+<![CDATA[
+97 COMPAT_30 { int sys_socket(int domain, int type, int protocol); }
+]]>
+</programlisting>
+ the actual syscall function will have this prototype:
+ <funcsynopsis>
+ <funcprototype>
+ <funcdef>int <function>compat_30_sys_socket</function></funcdef>
+ <paramdef>struct lwp *<parameter>l</parameter></paramdef>
+ <paramdef>void * <parameter>v</parameter></paramdef>
+ <paramdef>register_t *<parameter>retval</parameter></paramdef>
+ </funcprototype>
+ </funcsynopsis>
+ and <parameter>v</parameter> is a pointer to struct
+ compat_30_sys_socket_args.
+ </para>
+
+ </sect2>
+
+ <sect2 id="libc_syscall">
+ <title>System call implementation in libc</title>
+ <para>The system call implementation in libc is autogenerated
+ from the kernel implementation. The
+ <filename>syscall.h</filename> file contains defines which map
+ the syscall names to syscall numbers. The syscall function names are
+ changed by replacing the sys_ prefix by SYS_. By including
+ "SYS.h", we get this header file and the RSYSCALL macro, which
+ accepts the syscall name, automatically adds back the SYS_
+ prefix, takes the corresponding number, and defines a function
+ of the name given whose body is just the execution of the
+ syscall itself with the right number. (The method of execution
+ and of giving the number and function arguments are machine
+ dependent, this is hidden in the RSYSCALL macro.) </para>
+
+ <para>This means that e.g. the implementation of the access(2)
+ function in libc consists of an access.S file containing just:
+<programlisting>
+<![CDATA[
+#include "SYS.h"
+RSYSCALL(access)
+]]>
+</programlisting>
+
+ To automate this further, it is enough to add the name of this
+ file to the ASM variable in
+ <filename>libc/sys/Makefile.inc</filename> and the file will be
+ autogenerated with this content. </para>
+
+ <para>This is true for libc functions which correspond exactly
+ to the kernel syscalls. It is not always the case, even if the
+ functions are found in section 2 of the manuals. For example the
+ wait, wait3 and waitpid functions are implemented as wrappers of
+ only one syscall, wait4. In such case the procedure above yields
+ the wait4 function and the wrappers can reference it as if it
+ were a normal C function. </para>
+
+ </sect2>
+
+ <sect2><title>Versioning a system call</title>
+ <para>If the system call ABI (or even API) changes, it is
+ necessary to implement the old syscall with the original semantics
+ to be used by old binaries. The new version of the syscall has a
+ different syscall number, while the original one retains the old
+ number. This is called versioning.</para>
+
+ <para>The naming conventions associated with versioning are
+ complex. If the original system call is called foo (and
+ implemented by a sys_foo function) and it is changed after the x.y
+ release, the new syscall will be named __fooxy, with the function
+ implementing it being named sys___fooxy. The original syscall
+ (left for compatibility) will be still declared as sys_foo in
+ <filename>syscalls.master</filename>, but will be tagged as
+ COMPAT_XY, so the function will be named compat_xy_sys_foo. We
+ will call sys_foo the original version, sys___fooxy the new
+ version and compat_xy_sys_foo the compatibility version in the
+ procedure described below.</para>
+ <para>Now if the syscall is versioned again after version z.q has
+ been released, the newest version will be called __foozq. The
+ intermediate version (formerly the new version) will have to be
+ retained for compatibility, so it will be tagged as COMPAT_ZQ,
+ which will change the function name from sys___fooxy to
+ compat_zq_sys___fooxy. The oldest version compat_xy_sys_foo will
+ be unaffected by the second versioning.
+ </para>
+
+ <para>What needs to be done:
+ <itemizedlist>
+ <listitem>
+ <simpara>tag the old version with COMPAT_XY in
+ <filename>syscalls.master</filename>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>add the new version at the end of
+ <filename>syscalls.master</filename> (this effectively allocates a
+ new syscall number)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>name the new version as described above
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>tag the old version with COMPAT_XY in syscalls.master
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>implement the compatibility version, name it
+ compat_xy_sys_... as described above. The implementation belongs
+ under <filename>src/sys/compat</filename> and it shouldn't be a
+ modified copy of the new version, because the copies would
+ eventually diverge. Rather, it should be implemented in terms of
+ the new version, adding the adjustements needed for compatibility
+ (which means that it should behave exactly as the old version did.)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>find all references to the old syscall function in the
+ kernel and point them to the compatibility version or to the new
+ version as appropriate. (The kernel would not link otherwise.)
+ </simpara>
+ </listitem>
+ </itemizedlist>
+ Now the kernel should be compilable and old statically linked
+ binaries should work, as should binaries using the old
+ libc. Nothing uses the new syscall yet. We have to make a new
+ libc, which will contain both the new and the compatibility
+ syscall:
+ <itemizedlist>
+ <listitem>
+ <simpara>in <filename>libc/sys/Makefile.inc</filename>, replace
+ the name of the old syscall by the new syscall (__fooxy in our
+ example). When libc is rebuilt, it will contain the new function,
+ but no programs use this internal name with underscore, so it is
+ not useful yet. Also, we have lost the old name.</simpara>
+ </listitem>
+ <listitem>
+ <para>To make newly compiled programs use the new syscall when
+ they refer to the usual name (foo in our example), we add a
+ __RENAME(__fooxy) statement after the declaration of foo in the
+ system header file where foo is declared:
+ <programlisting>
+<![CDATA[
+int foo(int, int, int)
+#if !defined(__LIBC12_SOURCE__) && !defined(_STANDALONE)
+__RENAME(__fooxy)
+#endif
+]]>
+</programlisting>
+ Now, when a program is recompiled using this header, references to
+ foo will be replaced by __fooxy, except for compilation of
+ standalone tools (basically bootloaders) and libc itself. Old
+ binaries are unaware of this and continue to reference foo.
+ </para>
+ </listitem>
+ <listitem>
+ <simpara>To make the old binaries work with the new libc, we must
+ add the old function. We add it under
+ <filename>libc/compat/sys</filename>, implementing it using the
+ new function. Note that we did not use the compatibility syscall
+ in the kernal at all, so old programs will work with the new libc,
+ even if the kernel is built without COMPAT_XY. The compatibility
+ syscall is there only for the old libc, which is used if the
+ shared library was not upgraded, or internally by statically
+ linked programs. </simpara>
+ </listitem>
+ </itemizedlist>
+ We are done - we have covered the cases of old binaries, old libc and
+ new kernel (including statically linked binaries), old binaries,
+ new libc and new kernel, and new binaries, new libc and new kernel.
+ </para>
+
+ <para>When committing, one should remember to commit the source
+ (<filename>syscalls.master</filename>) for the autogenerated files
+ first, and then regenerate and commit the autogenerated
+ files. They contain the RCS Id of the source file and this way,
+ the RCS Id will refer to the current source version. The assembly
+ files generated by <filename>libc/sys/Makefile.inc</filename> are
+ not kept in the repository at all, they are regenerated every time
+ libc is built.</para>
</sect2>
<sect2 id="to64">
--T4sUOijqQbZv57TR--