tech-kern: Re: CMSG

Subject: Re: CMSG_* problems
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Robert Elz <kre@munnari.OZ.AU>
List: tech-kern
Date: 02/14/2007 14:59:39
    Date:        Wed, 14 Feb 2007 00:09:48 -0500 (EST)
    From:        der Mouse <mouse@Rodents.Montreal.QC.CA>
    Message-ID:  <200702140525.AAA26447@Sparkle.Rodents.Montreal.QC.CA>

  | It is?  Where did you get that?  My aim is to write *clean* application
  | code.

Oh.   Really.   Portable I could see from the suggestion.
Clean is a stretch.

  | Which wouldn't bother me.  Much of my code doesn't work anywhere but
  | NetBSD already,

If you're willing to assume NetBSD then you don't need anything
added to NetBSD headers, you can write your own macros (which I
assume you have already done) and simply reference the internals
of the NetBSD data structs.   That is, you can know exactly
what is in the NetBSD sys/socket.h.

I'd assumed you were looking for something that would allow portable
code (to more than NetBSD) which would then preclude your code
knowing anything about NetBSD internals - thus requiring any
new access methods to be NetBSD provided rather than belonging
to your applications.

  | Indeed.  And how can I do that without having an implementation of my
  | proposal to point to?  Innovation requires an innovator, requires
  | someone to be first.
  | 
  | Of course, I can do that without involving NetBSD per se,

Yes... (to all the last quoted segment).

  | But I had thought NetBSD was about writing clean
  | code, and - at least to my own eye! - macros such as I proposed permit
  | significantly cleaner code.

More portable code (for one kind of portability issue yes), but
cleaner, no.  They build in application knowledge of the kind of
data structs that the application shouldn't have.   That is,
you know that there's an integer offset that you can add to
something to get the next header, and that you can increment that
integer offset to move from one header to another.   That's the
kind of knowledge that the CMSG_FIRSTHDR() CMSG_NXTHDR() CMSG_DATA()
interface avoids.   To me that makes the CMSG_* interface (the
2292 interface) cleaner than the one you're proposing.
Slightly harder to use portably perhaps, but cleaner.

Eg: to take the example from 2292 you quoted in a previous message:

    for (cmsgptr = CMSG_FIRSTHDR(&msg); cmsgptr != NULL;
         cmsgptr = CMSG_NXTHDR(&msg, cmsgptr)) {
        if (cmsgptr->cmsg_level == ... && cmsgptr->cmsg_type == ... ) {
            u_char  *ptr;

            ptr = CMSG_DATA(cmsgptr);
            /* process data pointed to by ptr */
        }
    }

If I wanted to clean that up, what I'd do is more like

    for (cmsgptr = CMSG_FIRSTHDR(&msg); cmsgptr != NULL;
         cmsgptr = CMSG_NXTHDR(&msg, cmsgptr)) {
        if (CMSG_LVL(cmsgptr) == ... && CMSG_TYPE(cmsgptr) == ... ) {
	    whatever_t x;

	    memcpy(&x, CMSG_DATA(cmsgptr), sizeof x);
            /* process data in x */
        }
    }

If you wanted you could add a CMSG_DATALEN() macro and
check that CMSG_DATALEN(cmsgptr) == sizeof x before the
memcpy - that is, if you don't trust the OS not to
lie to you about what it is giving back.

This way the struct returned (the whatever_t) has no problems
with alignment or anything, and if anyone cares about alignment
of the cmsg headers themselves (that is, if we aren't willing
to simply demand that msg be properly aligned to hold a struct
cmsghdr) then we can bury the details of how to handle that inside
the CMSG_NXTHDR() CMSG_LVL() and CMSG_TYPE() macros (etc).

Personally I wouldn't bother, the application writer is
supposed to know that the buffer is to hold cmsg hdrs, if
the buffer provided is not suitable for that, just allow
the code to fail.

After all, if I'm reading a file, that contains a sequence
of longs, and I do

	char buf[BUFSIZE];
	long *lp;

	n = read(fd, buf, BUFSIZE);
	lp = (long *)buf;
	if (*lp ...

I don't bitch and moan about the read interface being
inadequate when the code dies on the *lp reference,
do I?   Do you?

That kind of enhancement (addition of CMSG_TYPE() and
CMSG_LEVEL()) is to me much cleaner than what you proposed,
while being just as portable.   It allows the application
to remain in ignorance of the actual mechanics of the
cmsg lists, which is a good thing.  Then it just requires
(as does your proposal) the programmer discipline to
copy the struct (or whatever) out of the cmsg header
data space, and into a local variable, before referencing it
rather than simply dereferencing the CMSG_DATA() pointer.

  | True.  If you'd like to try to hash out a better interface without even
  | considering compatability, I'd be interested.

I would - but first we have to agree on what compatability I'm
(or we're) willing to forego.   I wouldn't change the kernel/user
API in this area, doing that means either versioned interfaces
in the kernel, or flag days, neither of which is very nice.
Sometimes one or the other is inevitable, but here the
interface is OK, all that's need is for it to be used sanely.

So, what I'd prefer to do is to simply allow applications to
ignore the CMSG_* macros (if they want, they'd still be there
for applications that already use them, or for anyone else
who likes that way) and instead provide a whole new interface
for scanning the ancillary data which is much easier to use
for most applications (though perhaps not as fast).

That may involve calling a function which makes callbacks
passing back (properly aligned) structures (copied out of
the cmsg buffer if necessary) or providing function(s) that
will search for and find a cmsg header of a particular type,
and return its data (in a properly aligned struct - together
with a "free" function companion).  Or ...

That could be done, the functions/macros that implement it could
be made available to all, they'd probably use the CMSG_*
access stuff internally (perhaps along with some extra glue)
so code using them could be both architecturally and
OS portable, and be truly clean.

kre