Subject: Re: M:N and blocking ops without SA, AIO
To: Matthew Mondor <mm_lists@pulsar-zone.net>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 03/01/2007 09:28:37
--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 01, 2007 at 05:22:45AM -0500, Matthew Mondor wrote:
> From my understanding, an M:N threading model which uses a pool of
> kernel LWPs without using Scheduler Activations would need to be able to
> poll asynchroneously in its userland library scheduler for any
> potentially blocking syscalls.

I'm not sure. I don't think we've done enough of a post-mortem to tell.

One idea I've been batting around is revive SA, but change the way the=20
kernel tells userland to make a new thread. My understanding is that=20
scheduler activations were effectively signals, and the kernel would fork=
=20
a new lwp and run the SA upcall on it.

The problem was that the kernel had to be able to allocate a new LWP=20
whenever it blocked. That means we had to be able to allocate LWPs even=20
when the locking forbid it.

My thought was instead do something more mach-like. Have userland fork
threads. Make a new system call that forks a new thread/lwp and blocks.=20
libpthread has already created a thread in userland, and now it just=20
passes that into the kernel. The kernel then makes the new lwp and runs=20
the thread on it. The call then checks to see if the process is below its=
=20
concurrency level. If not, it blocks until it is. If it is below, it=20
immediately returns. Userland then looks for a thread to run and repeats=20
the process. If there's no thread to run, the call just passes in a  NULL=
=20
thread, or something to say "nothing more to do, just rearming".

The main point is that we change how a new LWP is created. Rather than=20
doign it when we block, we have something sleeping which we wake. It in=20
turn does something about it. And we also have a thread sitting around=20
that acts as a thread factory. Whenever it unblocks, it sets a new thread=
=20
up to be run.

Hmmm... In retrospect, it might have worked as well to keep SA, but to=20
have a per-process kthread that sleeps. Whenever something blocked and=20
wanted to fork a new LWP, it instead wakes said kthread. Said kthread then=
=20
runs and forks an LWP and runs an upcall on it.

Hmmm.... So how is that different than having a thread in sawait I now=20
wonder. Oh, signal delivery.

My understanding was that the problems with SA were in the kernel and how=
=20
it handled blocking. As I've managed to wander into confusion quite=20
easily, we probably need to figure out more what was wrong before trying=20
to fix it. Among other things, the fix may end up being simpler.

> Threads which wouldn't invoke calls internally yielding them back to the
> user-space scheduler could be mapped to an LWP so that the kernel may
> preempt them efficiently, but a form of preemption would also be
> required in the user-space scheduler to detect this event and assign
> them to an LWP, potentially with minimal kernel help to efficiently
> detect this condition.
>=20
> As for I/O blocking syscalls and locking functions, they could be
> remapped to be non-blocking by the library in userland to be handled by
> the userland scheduler.  However:
>=20
> Although I see that kqueue provides good and efficient mechanisms to
> handle network I/O polling, I noticed lately that it doesn't provide
> similar functionality for disk I/O (or at least the man page doesn't
> mention that it might work on other descriptors than sockets, pipes,
> FIFOs and vnodes (for modification notification)).

In terms of select, the disk is always ready.

> After a short exchange on IRC it appeared that AIO would be a feature
> which we're missing to allow I/O asynchroneous operations and polling
> for disk I/O.  Without this functionality, any thread blocking for the
> disk would also have to be considered candidates for mapping to an LWP
> at least temporarily (or to a pool of disk I/O LWP slaves).

I think aio is good, but I don't think this is how it should be used. You=
=20
don't have file descriptors that are blocking, you have operations which=20
are blocking. You can easily have hundreds on one given file descriptor.=20
So the right way to handle this is something like:

A file descriptor is in aio mode. A write or read comes in. The kernel
allocates an aio structure, copies in info from userland (like addresses,
etc for a read), fires off the i/o, and returns to userland. When the i/o
completes, the aio gets fed to a kevent to pass to userland. The aio=20
struct then gets destroyed.

I could have stuff wrong in therms of standards compliance, but I think=20
that's how it works. Or one way for it to work.

> It becomes apparent that with all required tools, only non-yielding
> threads performing number crunching (and those for which heuristics show
> enough crunching to justify it) would be directly mapped to an LWP...
>=20
> Do others agree that an M:N implementation without SA done with a
> user-space scheduler would greatly benefit from AIO?  If so, were there
> ever plans for AIO to eventually be implemented on NetBSD?  What are the
> known challenges involved that prevented such a feature so far, if any?
> Other than the buffer cache, DMA and interrupts handled by drivers and
> limited part of VFS, UFS layout, the rest of the disk I/O system is
> still currently unknown to me.

I think NetBSD will greatly benefit from AIO, regardless of the threading=
=20
model.

Take care,

Bill

--X1bOJ3K7DJ5YkBrT
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFF5w1FWz+3JHUci9cRAtwKAJ9lWebM0XccByfu6vN+WHAElq13PwCfVBeJ
RNMtGNkBjJlVMQkzd5s416E=
=JymW
-----END PGP SIGNATURE-----

--X1bOJ3K7DJ5YkBrT--