Subject: Re: CVS commit: src
To: Mindaugas Rasiukevicius <rmind@netbsd.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 04/30/2007 23:13:18
Hi,
On Mon, Apr 30, 2007 at 02:44:31PM +0000, Mindaugas Rasiukevicius wrote:
> Modified Files:
> src/distrib/sets/lists/comp: mi
> src/include: Makefile
> src/lib/librt: Makefile
> src/sys/conf: files
> src/sys/ddb: db_command.c db_interface.h db_xxx.c
> src/sys/kern: kern_exit.c kern_fork.c syscalls.master
> src/sys/sys: Makefile proc.h unistd.h
> Added Files:
> src/include: aio.h
> src/lib/librt/sys: Makefile.inc
> src/sys/kern: vfs_aio.c
> src/sys/sys: aio.h
>
> Log Message:
> Import of POSIX Asynchronous I/O.
> Seems to be quite stable. Some work still left to do.
>
> Please note, that syscalls are not yet MP-safe, because
> of the file and vnode subsystems.
Cool! I have some more comments & suggestions now that this is in the tree.
o It belongs in sys_aio.c for a couple of reasons. The file implements only
syscalls and direct support for them, and it's not a VFS component as
such. It should be possible to do AIO on other types of file descriptor.
o I'm still not fond of copying out the state to userspace only for the
kernel to copy it back in later when asked for it. I'm not familiar with
the ins and outs of the POSIX AIO stuff, but I'd prefer to see some kind
of job control block stick around in the kernel until userspace calls in
to collect it. Would that be possible?
o Holding p->p_mutex across mutex_init and pool_init is dicey since they can
sleep in order to allocate memory. There are number of other things that
depend on p_mutex, so if an LWP sleeps here it could tie other LWPs up in
knots waiting. What I suggest to fix that is a flag in p_flag and perhaps
another condvar that says "AIO is being initialized". Other threads that
end up in aio_init would wait for that flag to clear, then return when it
has cleared:
mutex_enter(&p->p_mutex);
if ((p->p_flag & PK_AIOINIT) != 0 || p->p_aio != NULL) {
/* Already initializing or initialized. */
while ((p->p_flag & PK_AIOINIT) != 0)
cv_wait(&p->p_mutex, &p->p_somecv);
if (p->p_aio != NULL) {
mutex_exit(&p->p_mutex);
return;
} else {
/*
* Another thread's initialization attempt
* may have failed: retry.
*/
}
}
o Testing for p->p_aio to be non-null appears to be unsafe as it can
currently be non-null while AIO is not fully initialized. To fix that
the worker thread could also check for the "AIO is initializing" flag in
p_flag, and wait until it clears. Then, we could set p_aio as the last
action in aio_init, like this:
/* No longer initializing. */
mutex_enter(&p->p_mutex);
p->p_aio = aio;
p->p_flag &= ~PK_AIOINIT;
cv_broadcast(&p->p_somecv);
mutex_exit(&p->p_mutex);
o In aio_exit, it's better to cache the value of p_aio, otherwise it will
generate a lot more machine code. If you don't believe me, try it both
ways and use objdump -d to check :-)
o In aio_worker, we should check for values other than EINTR too, so != 0
may be a better choice:
if (cv_wait_sig(&aio->aio_worker_cv,
&aio->aio_mtx) == EINTR) {
o In aio_process, it looks like this will leak file::f_slock:
if (auio.uio_resid > SSIZE_MAX) {
error = EINVAL;
goto done;
}
o In aio_enqueue_job, this check doesn't make a lot of sense since e.g
copyin() can block.
/* Check for the limit */
if (aio_jobs_count + 1 > AIO_MAX) /* XXXSMP */
return EAGAIN;
Really you should check the value and increment it in one go. With the
atomic ops API that would look something like this, but on a UP system
the idea is the same: do the check and allocate the resource at the
same time.
do {
u_int count = aio_jobs_count;
if (count + 1 > AIO_MAX)
return EAGAIN;
} while (atomic_cas_int(&aio_jobs_count, count, count + 1));
o In aio_enqueue_job:
/*
* Look for already existing job. If found - the job is in-progress.
* According to POSIX this is invalid, so return the error.
*/
This doesn't make sense because we unlock immediately after the check, but
without reserving the job ID. More generally the subsequent calls to e.g.
aio_init or copyout can block and a job with the same ID can be enqueued.
Traversing the list will get really expensive so some kind of a hash would
be good.
o In sys_aio_cancel, copyout is being called with aio_mtx held, That's valid
but will choke up anything else in the process that tries to use AIO if the
thread blocks while copying out the structure.
Cheers,
Andrew