Subject: Re: Request for comments: sharing memory between user/kernel space
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Zeljko Vrba <zvrba@globalnet.hr>
List: tech-kern
Date: 03/21/2007 17:43:29
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Bill Stouder-Studenmund <wrstuden@netbsd.org> writes:

>
> Well, one thing you could do is add an ioctl on a socket that passes in 
> memory buffers. Then change the socket code so that, on your special 
> sockets, data are copied to these pre-allocated buffers rather than a 
> socket buffer. That will get rid of one copy and would be rather clean.
>
Hm, I like the idea :) "one copy" here obviousley(?) refers to kernel->user
copying, but where are other copies (except for DMA from the card to the
system memory)?

>
> If there are no buffers, discard the TCP packets and don't ack the range, 
> and TCP will still work.
>
How would I handle signalling to the application?  I'd need to signal both
data arrival and packet discard.

>
> Well, one main problem is that the mbufs aren't mapped into the app's 
> address space. And on some architectures, user apps and the kernel are in
>
That's why I wanted to use a shared memory region, mapped both in the kernel
and application address space.  As I mentioned in the previous post, basically
only x86-64 is interesting for me. [however unpopular that might be with the
NetBSD developers :)]

>
> another. Further, you really don't want userland to be able to write to 
> the chains, so you'd need a r/o mapping for the headers and a r/w mapping 
> for the data.
>
It'd be nice if bookkeeping could be allocated separately from packet data.  I
don't care if the app mangles incoming packets (unless the kernel keeps the
packet "as is" for future reference, and mangling network headers actually
ruins the network stack - how is it done? Does kernel refer to the original
packet, or does it copy headers into mbuf chains as needed?)  Anyway,
security/safety is not on the top of my list for the moment.  Obviously, I'll
have to pay attention how I code my library :)

>
> One thing I'd tought of in the past (for an iSCSI target) was creating the 
> concept of an mbuf cookie. Userland would use a special call to gobble a 
> certain amount of data off of the receive queue for a socket, and those 
> mbufs would hang around in a process-level list. Userland would get back 
> an opaque cookie. The app then decides what it wants to do with the data, 
> then hands the cookie to a modified write call. The write call grabs the 
> mbufs indicated by the cookie and uses them as the data for a write to say 
> disk. Thus zero copy.
>
Nice.  What happened to the idea (ie. why hasn't it been implemented)?  I
guess that if the application wanted to actually inspect the data, it would
still be copied?

Hm, more details about my project: i'm writing (sort of) a "thread package"
which is based on asynchronous message-passing between "threads" (and includes
user-level scheduler).  I first wanted to write my own kernel which would
natively support such coding style, BUT.  There's a bunch of code that I'd
have to write (device drivers, memory management, networking stack, etc.),
which is both unfeasible and out of the scope of my project.  On the other
hand, this bunch of code already exists in the NetBSD kernel, is debugged,
quality-written, and so on, so it'd be foolish from me if I didn't try to
(re)use it.

So, the NetBSD kernel already _is_ internally pretty much asynchronous, no?  I
"just" want to reflect this asynchronicity to the user level.  Native DMA
would become shared memory, native IRQs become signals.  + fully 0-copy I/O;
I'd discard the "stream abstraction" which anyway does not exist in the
kernel, everything is packetized (or am I wrong?)

Why not do everything in the kernel as Allen suggested -- well, user-level has
obvious advantages: convenient debugger, some sort of protection (I guess
weakened if I implement what I'd like to, but still better than nothing),
being able to use other system services through familiar API, and so on.

Best regards,
  Zeljko.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFGAWC3UIHQih3H6ZQRAwVwAJ9JqNG1eIK+qvHA1VvPKi1YcF4RlwCfRkfg
cHwgWDvDZ1U5GSuN4XdQJdI=
=FavJ
-----END PGP SIGNATURE-----