Subject: Re: Request for comments: sharing memory between user/kernel space
To: Allen Briggs <briggs@netbsd.org>
From: Zeljko Vrba <zvrba@globalnet.hr>
List: tech-kern
Date: 03/21/2007 15:57:51
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Allen Briggs <briggs@netbsd.org> writes:
>
> 0-copy TCP receive is somewhat problematic.  Usually, you want a stream
> of data, but that data is broken up on the wire into ethernet frames
> that have essentially arbitrary headers and for which the payload is
> rarely, if ever, page-sized and page-aligned (even if you're using
> jumbo frames, cleverly aligned), and they can come in out of order.
> And interspersed in the TCP stream are other packets--other TCP packets,
> UDP packets, non-IP, etc.
>
I'm aware these problems.  But your 'usually' does not apply to my case: I'm
willing to give up on the stream abstraction on the application level.  The
kernel currently already does everything on the packet-level, and copies data
to a contiguous user buffer.  Why not just return to the user-level an array
of iovecs with proper <pointer,length> pairs referring to the TCP data
payload.  It's the work that the kernel does anyway, I'd 'just' (quotes since
I have no clue about implementation complexity) like to replace the data
copying part (which preserves the stream abstraction) with returning the iovec
array where pointers point into application-accessible memory (which breaks
the stream abstraction, and I don't care about that :))

>
> perhaps you have something else in mind...  Even with an intelligent
> NIC, it's not 0-copy--it's just 0-kernel-copy--the NIC has to do the
> assembly and copy for you.
>
Yes, the levels of "zeroness" :) By "0-copy" I mean avoiding copying the data
payload from the user to kernel-level.  It doesn't seem like an impossible
goal if I give up on the stream abstraction.

>
> mbuf chains--which have the scatter/gather information you want for
> this.  
>
What is problematic in "exporting" mbuf chains, together with associated data
payload, to the user-level?

>
> So it sounds to me like you kind of want a kernel application instead
> of a userspace application.
>
Oh, but making it in the kernel removes all the challenge from it :)
Seriously, the kernel already does _in kernel mode_ all that I want it to do
(chunking, scatter-gather i/o, wait queues, signalling, etc).  I 'just' want
to export this way of doing I/O to user-level in order to retain the obvious
benefits of user-level programming.

But hmm, your comment on in-kernel data moving between disk and network.. it
made me think a bit.. in any case Bill was right, I was jumping too fast into
implementation details :)

Thanks,
  Zeljko.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFGAUfzUIHQih3H6ZQRAzlpAJsGKz5WGXupv+52Z358alFV6GQNeQCgke0A
HeTVz32RWNLhrfakJP3Cfm8=
=ckD6
-----END PGP SIGNATURE-----