Subject: Re: Prototype kernel continuation-passing for NetBSD
To: Matt Thomas <matt@3am-software.com>
From: Jonathan Stone <jonathan@dsg.stanford.edu>
List: tech-kern
Date: 03/26/2004 09:15:20
In message <6.0.3.0.2.20040325150418.036996c8@localhost>Matt Thomas writes
>At 01:04 PM 1/28/2004, Jonathan Stone wrote:

[...]

Clearly I should improve kcont(9), because it hasn't been fully
understood.

kcont() will eventually/soon have functionaitly for asynchronous
notification.  It really _is_ intended to replace [l]tsleep/wakeup(),
so that (for emxaple) you can implement an nfs server without
requiring a context-switch (the sleep/wakeup) to notify the nfs-server
that i/o on a buffer has completed.  Crunch the numbers on filling
over half a 10GbE pipe; context-switch per operation is prohibitive.

That would (obviously) require passing a kcont object down through the
VFS layer, and chaining kconts off a struct buf with pending I/O.
I've looked at adding a struct kcont* to one of the other structs
already passed by reference.

You're absolutely right about the assumptions I made with IPLs. I'd
respond that IPLs shoulid be explicity ordered; and the (macppc)
become less shabby. Or we could fix kcont to not rely on that
assumption; I haven't thought hard about how to do that.


>At this point, I would instead make generic softintr required fuctionality,
>extend their capability, and kill kconts.  Or kill generic softintr and
>use kconts instead.  One or other but not both.

I'd kill the generic softints.  kcont lets you build and queue a kcont
intended to be called *much* later, such as when an I/O on the kcont
"object" completes. If you start reworking generic softints to add
that ability (so that we truly don't need both), you very quickly
arrive at something like kcont.

Code currently using generic softints would have to allocate a struct
kcont. That doesn't cost much, especially if it can be embedded in an
existing long-lived memory object (like a softc).

I've also had ... considrable experience implementing
application-level protocols inside the kernel.  For that, kcont is a
*huge* win over generic softints.

One use I made of (a prior incarnation of kcont) was implementing
sendfile() and splice(), using socket-upcall functions.  There, one
very quickly finds oneself wanting to shut down sockets (or even close
them, or write to them...) from inside an upcall "callback" function,
which is in turn in the middle of a function activation which is
frobbing the socket.

So you need to defer what the socket-level callback function until
some later time when the socket is mutable (its safe to call the
protocol-level functions to do stuff on that socket). IME, generic
softints don't have enough flexibility to do that very well; but there
are nontrivial bodies of private code using kcont for exactly that.

The main reason I didn't commit an example for this, is that NetBSD
has too many uses of curproc inside socket code, particularly
Unix-domain sockets. Shortly after a long-awaited event occurs this
weekend, I plan to eliminate several of those curproc uses; and commit
a kcont-based splice(). That might make the benefits of kcont clearer.

...  and I've also been thinking of reworking OCF to use kcont-style
continuations to keep track of its callbacks, instead of its own
internal hand-rolled equivalents.