Subject: Re: Implementation of POSIX message queue
To: Mindaugas R. <rmind@NetBSD.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 08/17/2007 11:51:21
Aside: I was looking at the spec and noticed that the following routines
need to be cancellation points:

    aio_suspend() mq_receive() mq_send() mq_timedreceive() mq_timedsend()

See:

    http://nxr.netbsd.org/source/xref/lib/libpthread/pthread_cancelstub.c

This might be complicated slightly by having them in librt. I'm not sure
what the solution is.

On Fri, Aug 17, 2007 at 06:19:20AM +0300, Mindaugas R. wrote:

> David Holland <dholland+netbsd@eecs.harvard.edu> wrote:
> > There are, unfortunately, two problems with what you've got: (1) you
> > haven't validated the whole block, only the first byte, so copyout can
> > still fail... and (2) even if you do validate the whole block in
> > advance, the copyout call can still fail if another thread has
> > rearranged the memory map in the meantime.
> 
> As I wrote in the source:
> 	/*
> 	 * Copy the data to the user-space.  In this stage user can falsify
> 	 * the pointers, but it would be a violation, thus the result will
> 	 * not be checked, and memory will be freed.
> 	 */
> The intention is to check for invalid pointers, not violations. I am not sure
> if it is worth checking (1) point.

But it doesn't buy you anything. What it tells you is that the system call
might fail with EFAULT, or that it may work and not return EFAULT - but
that's always going to be the case!

The principle of least astonishment applies here. If copying the message out
fails, then you should return an error to the caller. Consider a message
queue that's being used to carry credit card transactions in an EPOS *
system. If copyout() fails and we don't return an error to the caller, then
we've just lost a transaction and there's absolutely no indication that
something has gone wrong. If it's a refund or a high value purchase then
that means trouble - all you can do is scratch your head and apologise.

I can think of a few options:

o Hold the entire message queue locked while copying out the message. That
  sucks, but if the copyout() fails we can leave the message there.

o Have a receiver lock on the queue and hold it while copying out. Again
  that's not nice, but it means messages could still be pushed onto the
  queue while you're copying out.

o If POSIX does not dictate a strict order for messages pushed onto and
  popped off the queue, then if the copyout fails we can re-enqueue the
  message at an advantageous spot (so it gets taken off the queue by
  another receiver sooner rather than later).

Andrew

* This is one area where SCO OpenServer still seems to be quite popular.
It would be interesting to see how well our emulation works given recent
events!