Subject: Re: POSIX.4 real-time extensions?
To: Jarom r <jdolecek@netbsd.org>
From: glen mccready <gkm@petting-zoo.net>
List: tech-kern
Date: 07/06/2001 10:40:53
>Bill Sommerfeld wrote:
>> my understanding based on second-hand information is that aio requests
>> are simply queued for a separate thread and handled synchronously (one
>> at a time) by that worker thread.
>
>I suppose there is aio kthread per process in FreeBSD, right?

Nope.

>> a real aio implementation would expose maximum parallelism to the
>> underlying hardware i/o system.
>
>Yeah, though since the aio concept is quite alien to BSD kernel,
>nontrivial changes are needed to make it possible to not require
>process context for each aio request. Of course, a possible side-step
>would be to use global pool of aio kthreads instead of single thread.

This is what FreeBSD does.  But they also have a fast-path for VCHR
devices.  See src/sys/kern/ufs_aio.c in a recent FreeBSD tree.

>Other possible solution would be to have something to copy context
>from one thread to other. Then, it would be possible to e.g. only
>use a kthread until biowait(), there save the context and release
>the kthread. Once the i/o would be done, biodone() would get a kthread
>from pool, give it the context saved in biowait() and let it return
>to the code calling biowait(). Something similar to setjmp()/longjmp()
>in concept, though the stack size would probably be a problem.

An interesting way to work around functionality that should simply be
present in the lower layers...

>How do other OS manage this, e.g. Solaris? (It seems Linux doesn't support
>this, at least I can't find any mention of aio in 2.3.49 kernel sources).

Fear for my knowledge of this part of Linux, having been paid to implement
this very functionality for "raw devices" recently.  And it works much
like FreeBSD's VCHR path.  You queue requests onto the lower layers and part
of the request includes a callback function.  That function can do some
minimal work (in Linux) and then queue the results.  At that point you can
either signal the app and have it make a syscall and finish cleanup there,
or you can spin a kthread to do the clean up and return the results.  Doing
AIO against regular files would be tougher and would likely require either
a pool of kthreads, or whole new entry points for the filesystems.

All in all a non-trivial venture and one that will take some coordination
across the system to get right.  (Although I'm sure you could hack up
something very quickly by just using a system-wide pool of kthreads,
limiting the per-process use of such a pool, etc, etc... oh.. yeah.. like
FreeBSD. :-)

You may now express your dislike for Bach and bread() / breada(), and
general blocking schemantics altogether.

glen