Subject: Re: Implementation of POSIX message queue
To: None <tech-kern@NetBSD.org>
From: Mindaugas R. <rmind@NetBSD.org>
List: tech-kern
Date: 08/17/2007 03:02:48
Andrew Doran <ad@netbsd.org> wrote:
> 	pool_init(&mqueue_poll, sizeof(struct mqueue), 0, 0, 0,
> 	    "mqueue_poll", &pool_allocator_nointr, IPL_NONE);
> 	pool_init(&mqdes_poll, sizeof(struct mq_desc), 0, 0, 0,
> 	    "mqdes_poll", &pool_allocator_nointr, IPL_NONE);
> 
> I don't think it's worth having a pool for these since they are "low volume"
> items.

OK.

> On the other hand, it may be worthwhile having a pool for struct mq_msg,
> and then having a limited amount of storage within mq_msg itself for small
> messages. See ktealloc() in kern_trace.c.

This is a point I wanted to discuss. POSIX defines that message size is
fixed (for optimization purposes). There is a default value of message size,
however user could set the custom size in mq_open().
So your suggestion would be to always use pool with default size and allocate
the data part with kmem if bigger size is needed? Sounds good, just default
size changing via sysctl would be problematic..

> 	 * This is done without lock held, so while we are
> 	 * allocating - the count of descriptors might change, etc.
> 	 * Also, the file descriptors table must be consistent.
> 
> Really needs to be fixed before it goes in. The usual "check, allocate,
> lock, check again, unlock+free+retry" dance applies I think :-).

Yes, FIXME will be removed.

> <...>
> Should we not later be returning EPERM to the caller instead of EBADF?

Erm.. yes.

> 	*timo = mstohz((ts.tv_sec * 1000) + (ts.tv_nsec / 1000000));
> 
> tstohz() does the same thing and itimespecfix() is probably of use here.

Thanks. Someone should document this..

> 		/* Check the limit */
> 		if (p->p_mqueue_cnt == mq_open_max) {
> 
> That check's OK without any locks, but we need another one later while
> mqueue_lock is held.

Definitely.

> 	/*
> 	 * According to POSIX - no message should be removed on
> 	 * failing case, thus we will check the pointers here.
> 	 */
> 	if (subyte(msg_ptr, 0) == -1)
> 		return EFAULT;
> 	if (msg_prio && (subyte(msg_prio, 0) == -1))
> 		return EFAULT;
> 
> There's no real point to this and it's expensive to do. Does it make the
> system call pass some conformance test?
>
> <...>
>
> 	(void)copyout(msg->msg_ptr, msg_ptr, msg->msg_len);
> 
> The system call shouldn't fail silently, even if it means throwing the
> message away. The value from copyout should be returned.

POSIX defines, that if call fails - no message should be removed. In such
case, I think it is reasonable to validate the user-space pointer at start,
and ignore it later (as wrote in comment). Otherwise, if copyout() fails,
revert of message removal would be needed - that is problematic..

> 	/* XXXlock: If it fails - the new attributes should not be set */
> 
> Does the spec say so? I can't imagine it's a big deal..

Yes, defined by POSIX.

> 
> 	/* Wake up all waiters, if there are such */
> 	if (mq->mq_attrib.mq_flags & MQ_RECEIVE)
> 		cv_broadcast(&mq->mq_send_cv);
> 	if (mq->mq_attrib.mq_flags & MQ_SEND)
> 		cv_broadcast(&mq->mq_recv_cv);
>
> These kinds of tests are't needed, since cv_broadcast/cv_signal already do
> it for you.

Right, "missthinked".

> Any functions that allocate or free memory aren't MP safe, and the same goes
> for file descriptor operations. Once the vmlocking stuff gets merged they
> can be marked MP safe.

Waiting for vmlocking.. :)

> > - For message queue descriptors, I would like to use a file descriptor
> > allocator - fdalloc()/fdremove() (follow the #ifdefs in the code), but
> > there is a problem - there is no need to allocate the file structure or
> > allow open(), read(), close(), etc calls, and fdalloc/fdremove is not
> > working with NULL fdp->fd_ofiles[fd].
> 
> Whyso? Is this for fork/clone? Are select/poll valid on a message queue?

Thought it is good API for this. Yes, clone/fork should copy the descriptors.
No, select()/poll() and all the rest would not be valid. That is why I
wont file structure, only descriptor table.

-- 
Best regards,
Mindaugas
www.NetBSD.org