Subject: Re: CVS commit: src
To: None <tech-kern@netbsd.org>
From: Mindaugas R. <rmind@NetBSD.org>
List: tech-kern
Date: 05/01/2007 15:36:30
People... where you have been all that week?...

Andrew Doran <ad@netbsd.org> wrote:
> o It belongs in sys_aio.c for a couple of reasons. The file implements only
>   syscalls and direct support for them, and it's not a VFS component as
>   such. It should be possible to do AIO on other types of file descriptor.
So you want to rename vfs_aio.c to sys_aio.c, after commit?... :(
OK, but the same is valid for vfs_bio.c, and probably various other files.

> o I'm still not fond of copying out the state to userspace only for the
>   kernel to copy it back in later when asked for it. I'm not familiar with
>   the ins and outs of the POSIX AIO stuff, but I'd prefer to see some kind
>   of job control block stick around in the kernel until userspace calls in
>   to collect it. Would that be possible?
Understood. There are no, of course, POSIX requirements for that. But
avoiding copyin() would cost the locking of aio_mtx and searching in the
queue. Keeping in mind that people also write such code:
	while (aio_error (aiocbp) == EINPROGRESS);
.. I would prefer the current way.

> o Holding p->p_mutex across mutex_init and pool_init is dicey since they can
>   sleep in order to allocate memory.  There are number of other things that
>   depend on p_mutex, so if an LWP sleeps here it could tie other LWPs up in
>   knots waiting. What I suggest to fix that is a flag in p_flag and perhaps
>   another condvar that says "AIO is being initialized". Other threads that
>   end up in aio_init would wait for that flag to clear, then return when it
>   has cleared:
> <...>
> o Testing for p->p_aio to be non-null appears to be unsafe as it can
>   currently be non-null while AIO is not fully initialized. To fix that
>   the worker thread could also check for the "AIO is initializing" flag in
>   p_flag, and wait until it clears. Then, we could set p_aio as the last
>   action in aio_init, like this:
> <...>
Previously you have suggested to make aio_mtx per proc structure, and
initialize it when creating a process. Here is a recheck:
	/* Recheck if we are really first */
	mutex_enter(&p->p_mutex);
	if (p->p_aio) {
		mutex_exit(&p->p_mutex);
		kmem_free(aio, sizeof(struct aioproc));
		return 0;
	}
	...
What about using aio_mtx instead of p_mutex for p_aio protection, and to fix
the MP-safeness problem, set the p->p_aio after the first part of init, like
that:
	...
	TAILQ_INIT(&aio->jobs_queue);
	p->p_aio = aio;
	mutex_exit(&p->aio_mtx);
In such case thread would sleep with aio_mtx held, but I do not see this as
a problem - AIO in this moment is not ready, aio_mtx is used only by AIO
subsystem and other early threads should wait anyway. Perhaps memory
allocation for p_aio should be moved after the mutex too, to avoid a useless
allocations in a case of a lot of early calls. The same model should be used
in aio_init() failing case, when calling aio_exit() (normal aio_exit()ing is
safe). What about this variant? Maybe I am missing something?

> o In aio_exit, it's better to cache the value of p_aio, otherwise it will
>   generate a lot more machine code. If you don't believe me, try it both
>   ways and use objdump -d to check :-)
You say - you know :)

> o In aio_enqueue_job, this check doesn't make a lot of sense since e.g
> <...>
>   Really you should check the value and increment it in one go. With the
>   atomic ops API that would look something like this, but on a UP system
>   the idea is the same: do the check and allocate the resource at the
>   same time.
> <...>
Yes, but we have to wait for the merge of atomic ops API branch for fixing
those XXXSMP.

> o In aio_enqueue_job:
> <...>
>   This doesn't make sense because we unlock immediately after the check, but
>   without reserving the job ID. More generally the subsequent calls to e.g. 
>   aio_init or copyout can block and a job with the same ID can be enqueued. 
Actually, I am thinking about removing this check at all. It is probably not
worth. POSIX denied that, and if application pass the aiocb, which is
EINPROGRESS - then it is application's fault - return value will be simply
overwritten. Also, it is expensive to do such check. What do you think?

>   Traversing the list will get really expensive so some kind of a hash would
>   be good.
Optimizing the search in the list is on the TODO. I have some ideas, about
making it per file descriptor. Talking about hash - reasonable, but at first
it would be good to observe how big queue of the jobs could be in a "real
world" situations e.g. with database.

The rest things are fixed (in my local tree, for now). Thanks!

-- 
Best regards,
Mindaugas
www.NetBSD.org