Re: Lua in-kernel (lbuf library)

To: Christoph Badura <bad%bsd.de@localhost>
Subject: Re: Lua in-kernel (lbuf library)
From: Lourival Vieira Neto <lourival.neto%gmail.com@localhost>
Date: Mon, 4 Nov 2013 18:13:40 -0200
Hi Christoph,

Firstly, thanks for your comments. I really appreciated that =).

BTW, I renamed lbuf to Lua bitwiser and removed support for "unamed"
array access (buf:mask(alignment [, offset, length]) and buf:mask{
length_pos1, length_pos2, ... }). Instead, I introduced a new way to
provide array access (see below).

On Sun, Oct 20, 2013 at 8:05 PM, Christoph Badura <bad%bsd.de@localhost> wrote:
> On Tue, Oct 15, 2013 at 06:01:29PM -0300, Lourival Vieira Neto wrote:
>> > Also, having to switch mentally between zero-based arrays in the kernel C
>> > code and 1-based arrays in the Lua code make my head ache.
>> It's something that doesn't bug me so much.. But, if necessary it
>> could be changed to 0-based in this userdata.
>
> When you create your own data structures, I guess it is a wash.  You have
> to adjust +/-1 in infrequent circumstances in either scenario.
>
> But in this case you are creating a special purpose language that operates
> in universe of zero-based array.  And that's not only the kernel code.
> Every Internet protocol specification that I remember is using zero-based
> indexing.  For someone dealing with both sides (the world and your lua
> library), it makes the difference between constantly having to be alert
> to remember to do the offset adjustment.  That is a lot more mental work
> for anyone working with this library.
>
> If you use 1-based indices talking to protocol people will be funny too:
>
> ``Anyone know why the flags in byte 6 of this packet are funny?''
> ``Sure, that's most likely because the flags are in byte 5.''
>
> I think it is worth thinking hard about this.

Yes.. I'll keep think on this. By now, I'm using 1-based indices to
Lua array-access on buffers and 0-based offsets to bitmask
definitions.

>> > I what is "buffer" and how does it relate to mbufs?
>> Note, non-contiguous buffer still an open problem in lbuf. I don't
>> know if should use a ptrdiff_t  to pass the distance to 'next' field,
>> a 'next()' to return 'next' field or something else.
>
> You seem to be talking about implementation.  I was talking about the
> interface of the library.

Yes, sorry. In that stage, sometimes, it is a little difficult to
separate these things.

>> However, you could create a lbuf from a mbuf header as follows:
>>   lbuf_new(L, mbuf->m_data, mbuf->m_len, NULL, true);
>
> I don't think that is a good way.  You say you want to inspect packet
> data in the kernel.  Well, the packet's data can be spread over a
> chain of mbufs.  Also, mbufs may have internal or externel storage.
> You don't want to deal with that as the user of your library.
>
> As a user, I want an interface like this:
>   lbuf_from_mbuf(L, mbuf, NULL, true);

I'm thinking in providing an interface like this in an adapter layer.
Thus, we could use Lua bitwiser library in other areas. I was just
giving an example of how to use it with the current implementation
(again, sorry for don't separate interface discussion from
implementation).

>> Yes, mea culpa =(. I wasn't clear about that. 'net' flag was the way I
>> found to 'record' the buffer endianness. What means, true if the
>> buffer uses BE and false if it uses HE. It has the same semantics of
>> hton* and ntoh* functions. Don't know if it is better to pass the
>> endianness itself as a flag (e.g., enum { BIG_ENDIAN, LITTLE_ENDIAN,
>> HOST_ENDIAN }). What do you think?
>
> For me the the most convenient interface would be if I didn't have to
> mention the host byteorder.  Just record what byteorder the buffer is in,
> and convert when appropriate.  Alan made a good point.  It maybe be
> convenient and/or necessary to specify a different byteorder in a mask.

I'm working on it; thinking in the following form:

bitwiser.mask{ field = { offset, length, sign, endian, step }}, where
sign is a boolean, endian is a string like 'host', 'h', 'big', 'b',
'little', 'l', 'net' or 'n', and step is a number between [1, 64].

And (to allow omitting or non-ordering of parameters):

bitwiser.mask{ field =
  { __offset = offset, __length = length, __sign = sign, __endian =
endian, __step = step }}

defaults are __sign = true, __endian = 'host' and __step = undef. If
step is present or length is omitted, the lib assumes that it is a
segment field, what means that it should return a bitwiser.buffer
userdatum if accessed, which can be accessed like an array (using
step, if it is defined, or else the step of the original buffer to
determine the length of each field). It also could be masked to use
field access. For example:

m = bitwiser.mask{
  type  = { 0, 4 },
  flags = { 4, 4, __step = 1 },
  payload = { 8 }
}

b = bitwiser.buffer{ 0xff, 0, 0xff, 0 } -- new buffers have step = 8, by default
b[1] --> 0xff

b:mask(m)

b.flags[1] = false --> unsets bit-4 (0-based)
b.flags[4] --> returns bit-7 (0-based), 1 in this case

b.payload:mask{ padding = { 0, 8 }, data = { 8, __step = 16 } }
b.payload.data[1] --> returns 2 bytes from bit-16 (0-based) of the
original buffer,
                          -- 0x00ff or 0xff00 depending on platform
endianess, in this case

>>   buf:rawget(0, 9) ~> if net flag is *true*: takes 16 bits from
>
> You know, I think rawget is badly named.  "Raw" implies unmodified.
> And byteswapping is a form of modification.  Maybe you can find a better
> name.

Yes, you're right. I changed that to :get(offset, length).

>> >> buf:mask(3)
>> >> buf[3] ~> accesses 3 bits from bit-6 position
>> >
>> > What does that mean? Does it return the top-most 2 bits from the first
>> > byte plus the least significant bit fom the second byte of the buffer?
>>
>> It means the least-most 2 bits from the first byte and the LSB from the 
>> second.
>
> I don't know what "least-most" means.  But since you don't seem to
> agree with my questions I assume you intend that to be the opposite of
> "top-most" then your statement doesn't make sense to me.  Translated
> into normal terms it would return bits 0 and 1 of the first byte and bit
> 0 of the second one.  They are not even contigous.

Really sorry about that. It was a brain-segfault (or "dyslexia"). I
intended to say the 'bottom-least' 2 bits from the first byte
prepending the 'top-most' 1 bit from the second byte ([ b6 | b7 | b8
], 0-based). Anyway, I just aborted this kind of array access.

>> > What is 'length' for?
>>
>> Offset and length could be used to impose boundaries to the mask.
>
> Don't tell it to me.  Write it into the documentation. :-)

Sure. As soon as I have one =).

>> >> buf:mask{ 2, 2, 32, 9 }
>> >> buf[2] ~> accesses 2 bits from bit-2 position
>> >
>> > What exactly would "buf[3]" return.  Please be explicit in whether you are
>> > counting byte offsets or bit offsets.  I can't figure that out from your
>> > description.
>>
>> It would return 32 bits (converted or not, depending on 'net' flag)
>> from bit-4 (MSB-ordered).
>
> From "bit-4"? From your own use of 1-based indices don't you mean the
> 5th bit?

Yes, bit-4 0-based or 5th. I'm using 1-based for Lua array access and
0-based for bit offsets. Sorry, I should have been clearer about it.

>> > Personally, the idea of making array access to the buffer depend on
>> > state stored in the buffer does not look appealing to me.  It prevents
>> > buffers to be passed around because consumers don't know what they will
>> > get back on array access.
>>
>> I think it could be useful to access nonaligned and aligned data
>> easily without caring about naming fields.
>
> I guess I did not make myself understandable.  Storing that state globally
> means that every function you hand an lbuf too has to set the global
> mask before it accesses the lbuf, because it can't know if the current
> mask is compatible with its own use.  And after call to a function, the
> caller has to reset the mask, because it could have been changed.
>
> so you end up with code like:
>
>   function filter1(packet)
>     packet:mask(whatever)
>     ...
>     filter2(packet)
>     packet:mask(whatever)  -- restore our mask
>     ...
>     filter3(packet)
>     packet:mask(whatever)  -- restore our mask
>     ...
>   end
>
>   function filter2(packet)
>     packet:mask(something_else)
>     ...
>     filter4(packet)
>     packet:mask(something_else)
>     ...
>   end
>
>   function filter3(packet)
>     packet:mask(entirely_different)
>     ...
>   end
>
> That, of course, is completely idiotic.
>
> The newbuf = buf:mask() functionalty completely avoids that problem and is
> sufficient.  I would provide only that interface.

Sure, but it would avoid buffer userdatum reusing too. I mean, you
couldn't change the buffer mask once it is masked. Moreover, the first
buffer in this masking chain would never be effectively masked (the
mask would be applied only to the new one). It would be used only as
the raw data handler. I prefer to have another function (like,
segment([offset, lenght])) to return a new buffer userdatum pointing
to the same raw data. In this case you would have:

function filter1(packet)
  packet:mask(whatever)
  ...
  filter2(packet:segment()) -- passes a new userdatum containing the
entire original data
  -- don't need to restore packet's mask here
  ...
  -- more commonly, passes a segment containing payload only
  filterN(packet:segment(payload_offset))
  ...
  -- or, if you have defined a payload field in the packet's mask
  filterN(packet.payload)
  ...
end

Regards
-- 
Lourival Vieira Neto
References:
- Lua in-kernel (lbuf library)
  - From: Lourival Vieira Neto
- Re: Lua in-kernel (lbuf library)
  - From: Christoph Badura
- Re: Lua in-kernel (lbuf library)
  - From: Lourival Vieira Neto
- Re: Lua in-kernel (lbuf library)
  - From: Christoph Badura
Prev by Date: Re: zero-length symlinks
Next by Date: Re: Lua in-kernel (lbuf library)
Previous by Thread: Re: Lua in-kernel (lbuf library)
Next by Thread: Re: Lua in-kernel (lbuf library)
Indexes:
Home | Main Index | Thread Index | Old Index