Possibility of basing a QNX-like OS on NetBSD?

To: tech-kern%netbsd.org@localhost
Subject: Possibility of basing a QNX-like OS on NetBSD?
From: Andrew Warkentin <andreww591%gmail.com@localhost>
Date: Sat, 25 May 2024 05:48:56 -0600
I'm currently working on a QNX-like microkernel OS based on a fork of
seL4 and an original root server. Recently I had someone suggest that
I should look at trying to turn NetBSD into a QNX-like microkernel
because seL4's focus is more on static non-Unix-like systems. However,
I think that would be more difficult than it seems at first glance
despite NetBSD being a Unix-like general-purpose OS like QNX. Even
though QNX looks superficially similar to conventional Unix-like OSes
in a lot of ways and is quite compatible with them, it is really quite
different from them in some pretty important ways.

One of the biggest of these is the IPC model. QNX's IPC essentially
acts as a cross-address space function call, and not a one-way message
queue. When a client process sends a message to a server over a
channel, the remainder of its timeslice is transferred directly to the
server process and a context switch occurs immediately, entirely
bypassing the scheduler queue in most cases. The client process is
blocked while the server processes the message. Once the server is
done, it sends a reply (rather than specifying the channel, it instead
specifies the message ID that it got when it received the message),
and the same direct context switch happens again in the opposite
direction, and the client is unblocked. It is possible for a server to
receive further messages even if it has previous messages that it
hasn't replied to. A collection of data buffers of arbitrary size and
location may be transferred in both directions; these buffers are
copied directly from the address space of the sender to that of the
receiver with no intermediary buffers in kernel space (specified by a
readv()/writev()-style vector). seL4's IPC already has basically
identical call/receive/reply with direct context switch semantics to
QNX, although it is limited to copying between per-thread single-page
buffers rather than arbitrary vectors. In my (currently unnamed) hard
fork of seL4 I already have a working preliminary implementation of
long IPC with arbitrary vectors, and while dealing with seL4's
unconventional preemption semantics was a little tricky, it wasn't
especially difficult to add long copying to the existing IPC layer. On
the other hand, from what I've seen from looking at the NetBSD
sources, implementing QNX-style IPC would require writing a complete
IPC layer from scratch and making several modifications to the
scheduler and virtual memory manager (trying to port a QNX/L4-style
IPC layer from a kernel that already has one probably wouldn't work,
since they're pretty tightly integrated).

Also, the general architecture is quite different. In order to get
something that is QNX-like enough for my liking, the vast majority of
subsystems would have to be removed from the kernel and moved into
separate user processes. Really it would have to be sort of like an
inverse rump kernel where just the inner kernel with the scheduler,
IPC, virtual memory management and some parts of the VFS would be
left. By the time I'm done, I'm not sure there would be much code left
untouched, and I think it might be easier to just continue on my
current path of using an seL4 fork and an original root server. I may
incorporate more code from other OSes into my root server, since I've
already done a little bit of that, but trying to wholesale convert
something like NetBSD into the kind of OS I want seems like it might
not be worth it. There have been monolithic-to-microkernel conversions
done in the past, although all of the ones I'm aware of are either
done with monolithic kernels that were designed from the start to be
converted (e.g. early Mach kernels), are "serverizations" where the
inner parts of the monolithic kernel like scheduling and basic memory
management are removed and the remainder is ported to a purpose-built
microkernel (e.g. LP49, Lites, and possibly MkLinux), or are just
treating the microkernel as a hypervisor (e.g. L4Linux). I'm not aware
of any monolithic-to-microkernel conversions where the outer
subsystems of the kernel are removed and the inner kernel is retained
as a microkernel happening, and I've read a lot on OS history and
played around with most of the historical OSes I can get my hands on
going back to ones from the 50s.

The VFS architecture specifically is rather different between
something like QNX and more conventional Unix-like systems. QNX's
central VFS or pathname manager (part of the process server) is much
simpler than a conventional Unix VFS; it pretty much just matches path
prefixes and hands out a connection to the channel ID of the server
handling the one that fits the path best; the open request to the
server contains the remainder of the path rather than an inode number.
Server calls generally bypass the VFS entirely and go straight from
the client through the microkernel to the server. The VFS I'm planning
to implement will be a little more conventional in order to allow for
stronger security (QNX's channel IDs are global and can be guessed,
and servers must check the UID/GID of all open requests to enforce
permissions); it will have a directory cache and vnodes and will at
least have the option of sending requests to servers by inode number
instead of path, but unlike a conventional VFS there will be no
on-disk device nodes or numbers for character and block devices.
Servers will export filesystems by opening a "port" file in a special
filesystem (these will have dynamically-assigned numbers but will be
impossible to create on regular filesystems), and the VFS will send
requests over this file; a successful open will drop a file descriptor
into the port on the server side as well as sending the other side of
it to the client, with all reads and writes except for those on
directories bypassing the VFS entirely. I'm not quite sure if writing
something completely original or trying to borrow some VFS code from
elsewhere and significantly modifying it would be easier here.

I also wish to disaggregate the usual Unix process model into a far
more thread-centric one where a process is nothing more than a
collection of threads that share a command line and get replaced on
exec, with all of the usual process state like virtual address space,
open file descriptors, and filesystem namespace being separate context
objects that have to be explicitly shared between threads, and the
basic process creating primitive just creating a completely blank
process that the parent explicitly initializes with all the necessary
state using various APIs (of course, there will be a library that
implements fork(), spawn(), and pthreads on top of this). seL4's
thread/memory model already maps onto this quite well, whereas a
NetBSD derivative would need significantly more work.

Another thing that I'm not sure about is the real-time performance. In
addition to desktop and server use, embedded systems with hard
real-time constraints are also an important use case for this system.
Just being lightweight microkernels goes a long way to making
QNX/L4-like kernels lend themselves well to real-time use (and these
kinds of kernels often have specific optimizations for hard real-time
as well). I'm not sure if a conventional Unix-like kernel like NetBSD
could consistently match the real-time performance of QNX, even though
I did see one paper that said NetBSD has gotten pretty decent there.

Am I right in thinking I should just stick with my original seL4 fork
+ mostly original root server plan?

One place where I definitely do plan to use code borrowed from
conventional Unix-like OSes is for some major subsystems like device
drivers, disk filesystems, and the network stack; these will be based
on LKL and/or the rump kernel; incorporating shims between the library
kernel and the base VFS for these seems easier than trying to convert
something like NetBSD into a microkernel-ish system wholesale.
Follow-Ups:
- Re: Possibility of basing a QNX-like OS on NetBSD?
  - From: David Holland
Prev by Date: Re: NetBSD-10: write to rw null mount on top of LFS fs hangs up
Next by Date: Re: Possibility of basing a QNX-like OS on NetBSD?
Previous by Thread: NetBSD-10: write to rw null mount on top of LFS fs hangs up
Next by Thread: Re: Possibility of basing a QNX-like OS on NetBSD?
Indexes:
Home | Main Index | Thread Index | Old Index