tech-kern: Re: Implementation of POSIX message queue

Subject: Re: Implementation of POSIX message queue
To: Mindaugas R. <rmind@NetBSD.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 08/16/2007 20:17:27
Hi,

On Thu, Aug 16, 2007 at 12:52:58AM +0300, Mindaugas R. wrote:

> Hello,
> 
> here is the implementation of POSIX message queues:

Nice work!
 
> http://www.netbsd.org/~rmind/mqueue.diff
> http://www.netbsd.org/~rmind/sys_mqueue.c
> http://www.netbsd.org/~rmind/sys_mqueue.h (src/sys/sys/mqueue.h)
> http://www.netbsd.org/~rmind/include_mqueue.h (src/include/mqueue.h)
> 
> Most of the POSIX Test-Suite tests passes.
> Please review.

 * Lock order:
 * 	mqueue_lock
 * 	  -> mqueue::mq_mtx

A brief description in the header file or here about which locks protect
what would be useful, if you're willing to do that.

	pool_init(&mqueue_poll, sizeof(struct mqueue), 0, 0, 0,
	    "mqueue_poll", &pool_allocator_nointr, IPL_NONE);
	pool_init(&mqdes_poll, sizeof(struct mq_desc), 0, 0, 0,
	    "mqdes_poll", &pool_allocator_nointr, IPL_NONE);

I don't think it's worth having a pool for these since they are "low volume"
items. On the other hand, it may be worthwhile having a pool for struct
mq_msg, and then having a limited amount of storage within mq_msg itself for
small messages. See ktealloc() in kern_trace.c.

	 * This is done without lock held, so while we are
	 * allocating - the count of descriptors might change, etc.
	 * Also, the file descriptors table must be consistent.

Really needs to be fixed before it goes in. The usual "check, allocate,
lock, check again, unlock+free+retry" dance applies I think :-).

	/* TODO: Hashing */

hash32_str()?

		/* Lock the message queue, and return .. */
		mutex_enter(&mq->mq_mtx);
		/* Check the access mode and permission */
		if (acc_mode != VNOVAL) {
			if (((acc_mode & mqdes->mq_acc_mode) == 0) ||
			    mqueue_access(l, mq, acc_mode)) {
				mutex_exit(&mq->mq_mtx);
				mq = NULL;
			}
		}
		break;

Should we not later be returning EPERM to the caller instead of EBADF?

	*timo = mstohz((ts.tv_sec * 1000) + (ts.tv_nsec / 1000000));

tstohz() does the same thing and itimespecfix() is probably of use here.

		/* Check the limit */
		if (p->p_mqueue_cnt == mq_open_max) {

That check's OK without any locks, but we need another one later while
mqueue_lock is held.

	/*
	 * According to POSIX - no message should be removed on
	 * failing case, thus we will check the pointers here.
	 */
	if (subyte(msg_ptr, 0) == -1)
		return EFAULT;
	if (msg_prio && (subyte(msg_prio, 0) == -1))
		return EFAULT;

There's no real point to this and it's expensive to do. Does it make the
system call pass some conformance test?

		/*
		 * Block until someone sends the message.
		 * While doing this, notification should not be sent.
		 */
		mq->mq_attrib.mq_flags |= MQ_RECEIVE;
		error = cv_timedwait_sig(&mq->mq_send_cv, &mq->mq_mtx, t);
		mq->mq_attrib.mq_flags &= ~MQ_RECEIVE;
		if (error || (mq->mq_attrib.mq_flags & MQ_UNLINK)) {
			mutex_exit(&mq->mq_mtx);
			return (error == EWOULDBLOCK) ? ETIMEDOUT : EINTR;
		}

error should be returned here instead of EINTR, unless the system call is
deemed to be non-restartable in which case it should read:

			switch (error) {
			case EWOULDBLOCK:
				return ETIMEDOUT;
			case ERESTART:
				return EINTR;
			default:
				return error;
			}

	(void)copyout(msg->msg_ptr, msg_ptr, msg->msg_len);

The system call shouldn't fail silently, even if it means throwing the
message away. The value from copyout should be returned.

	/* XXXlock: If it fails - the new attributes should not be set */

Does the spec say so? I can't imagine it's a big deal..

	/* Wake up all waiters, if there are such */
	if (mq->mq_attrib.mq_flags & MQ_RECEIVE)
		cv_broadcast(&mq->mq_send_cv);
	if (mq->mq_attrib.mq_flags & MQ_SEND)
		cv_broadcast(&mq->mq_recv_cv);

These kinds of tests are't needed, since cv_broadcast/cv_signal already do
it for you.

+257	STD MPSAFE	{ mqd_t sys_mq_open(const char * name, int oflag, \
+			    mode_t mode, struct mq_attr *attr); }

...

Any functions that allocate or free memory aren't MP safe, and the same goes
for file descriptor operations. Once the vmlocking stuff gets merged they can
be marked MP safe.

> - For message queue descriptors, I would like to use a file descriptor
> allocator - fdalloc()/fdremove() (follow the #ifdefs in the code), but there
> is a problem - there is no need to allocate the file structure or allow
> open(), read(), close(), etc calls, and fdalloc/fdremove is not working
> with NULL fdp->fd_ofiles[fd].

Whyso? Is this for fork/clone? Are select/poll valid on a message queue?

> - Currently, proc::p_mqd list of descriptors is protected by global
> mqueue_lock RW-lock. I am not sure if this worth inventing a separate
> per-process lock, because only mq_open(), mq_close() and mq_unlink() would
> compete (they acquires write lock). In normal case, these should not be an
> often calls in the system. Anyway - any thoughts on this?

If you find a non-contrived usage and lockstat shows it's a problem then
maybe we should look at it. I agree with Jason, it should be fine for now.

Thanks,
Andrew