tech-kern: Re: Prototype kernel continuation-passing for NetBSD

Subject: Re: Prototype kernel continuation-passing for NetBSD
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Matt Thomas <matt@3am-software.com>
List: tech-kern
Date: 03/27/2004 11:50:29
On Mar 27, 2004, at 10:35 AM, Jonathan Stone wrote:

> Hi Matt,
>
> We seem to be at cross-purposes again. The whole point of this change
> is to be able to call sosend() from outside of process context: from a
> softint or (in my case) from a kcont.

I understand that.  The procp is passed via the uio.  The uio struct
has a uio_procp which contains the proc pointer.

> For that, I beleive the sosend/so_send changes you describe as "not
> needed" are actually essential.  Good thing I said I was happy to wait
> for awhile :-/
>
> I tried, and it just doesn't work to use the uio: for splice(),
> particularly DGRAM-to-DGRAM, you really want to pass in a non-NULL
> mbuf chain, and a NULL uio. You can't use the struct proc in a uio, if
> you don't have a uio to start with.

Don't pass in a NULL uio.  However I'm trying to figure what sosend
in your case needs the proc for?  The only thing sosend uses the proc
for is to increment p->p_stats->p_ru.ru_msgsnd.  For an in-kernel
caller, that can be avoided.  If you really want that, then modify
the if (uio) resid to be if (uio && uio->uio_iov) then you will have
the proper sematics.

> For socreate(): sure, its reasonable, so I did it whilst I was in
> there fixing sosend(). Last I looked, these changes still don't go far
> enough to call socreate() from a kcont. I have tried; but creating a
> PF_UNIX socket runs smack into uses of curproc down in the VFS layer,
> when tying the socket into the filesystem namespace.

As I indicated above, the changes to sosend aren't needed.  In fact,
I'm running a kernel with socreate and soconnect and dom_externalize
getting passed the proc pointer.

> For splice() [and for the tee() Jeff Mogul suggested], one can mandate
> that userspace do the socreate() and pass an open fd into the syscall
> that sets up the kcont()-based splice()/tee() code.  For my own
> private uses, I've used a kthread (tho' you have to be careful about
> setting its filesystem context first).
>
> There are several cases (nfs, etc) where you suggest passing a NULL
> process to socreate() instead curproc.  I agree, these really
> shouldn't be using curproc. But I've tried, and when last I tried
> it[*], passing NULL would trigger kernel panics. Sure, I'd like to fix
> those to not use curproc *and* not panic; but I can't guarantee to fix
> them all right now.  Passing curproc is conservative and safe (in the
> sense that the semantic end-result is exactly the same as what we have
> now, where socreate() just uses curproc).

The only point in nfs where the process is not available (eventually)
is on the nfs_reconnect.  And using curproc for that is wrong.  (think
user nfs mount over tcp and then a reconnect, which would happen using
a root kthread).  But looking at the code, it uses &proc0 for sobind
and I see no reason to be inconsistent so why don't we &proc0 for
socreate and soconnect as well?

> Again, this could wait for a day or two. Or I could commit part
> (or all) now: whichever you prefer.

That leaves the only stuff using curproc the nfs_boot code and that's
just fine with me.

> [*] What I was using wasn't exactly NetBSD, though.

I think I'm going to make kcont's optional for 2.0.  And then expand/fix
their use in post-2.0.  The branch for 2.0 and issues too massive to fix
before branching.

-- 
Matt Thomas                     email: matt@3am-software.com
3am Software Foundry              www: http://3am-software.com/bio/matt/
Cupertino, CA              disclaimer: I avow all knowledge of this 
message.