tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A Library for Converting Data to and from C Structs for Lua



On Sun, Nov 24, 2013 at 10:06 PM, James K. Lowden
<jklowden%schemamania.org@localhost> wrote:
> On Sat, 23 Nov 2013 11:46:19 -0200
> Lourival Vieira Neto <lourival.neto%gmail.com@localhost> wrote:
>> On Sat, Nov 23, 2013 at 1:22 AM, James K. Lowden
>> <jklowden%schemamania.org@localhost> wrote:
>> > On Mon, 18 Nov 2013 09:07:52 +0100
>> > Marc Balmer <marc%msys.ch@localhost> wrote:
>> >
>> >> After discussion we lneto@ and others we realised that there are
>> >> several such libraries around, and that I as well as lneto@ wrote
>> >> one.  SO we decided to merge our works
>> >
>> > How do you deal with the usual issues of alignment and endianism?
>>
>> d = data.new{0xF0, 0xFF, 0x00} -- creates a new data object with 3
>> bytes. d:layout{
>>   x = { __offset = 0, __length = 3 },
>>   y = { __offset = 8, __length = 16, __endian = 'net' },
>>   z = { __offset = 0, __step = 9 }
>> }
>>
>> d.x -- returns the 3 most significant bits from d (that is, 7)
>> d.y -- returns 16 bits counting from bit-8 most significant.
>>       -- in this case, these 2 bytes are converted using ntohs(3),
>> that is 0xFF00.
>> d.z[1] -- returns the 9 most significant bits from d (that is, 0x1E1).
>
> Hi Lourival,
>
> Thanks for your answer.  A few questions and observations, if I may.

You are welcome. Of course you may =).

> 1.  What is the significance of the leading underscores?

It is used as a mark to distinguish parameters from field names. It is
specially useful for nested fields and for the global behavior of the
layout. For example: d:layout{__endian = 'net', .... }, would use big
endian for all fields (except which has set it explicitly).

> 2.  I assume you mean d.x represents the three *least* significant bits.

Why? I really meant three *most* significant bits. In this example:
[* 1 | 1 | 1 * | 1 | 0 | 0 | 0 | 0 ].

> I don't understand "step", not that it matters.

Sorry, I didn't describe the API itself; I just illustrated a little
example. The __step parameter is used for array accessing. In the
previous example, the d.z could be indexed using Lua array notation,
where each position corresponds to 9 bits from data, starting from
bit-0 most significant (__offset = 0 and __step = 9).

> For purposes of extracting/packing values in a buffer, offset and length are 
> all you
> need.

In fact, you only *need* bit operations for that. However, I think
that could be more pleasant to have a declarative API for that.

> Semantics require a type system for the bit patterns.  I guess "y" is
> implied to be a 16-bit integer, since it has endianism, but its
> signedness is unspecified.  I suggest you enumerate all types you will
> support, and that that set encompass all types that a C compiler can
> generate.

I'm only handling integers.

> If you include an "ignore" type (cf. Perl's pack/unpack
> functions), you can drop "offset" from your description, for which
> you'll be glad eventually.

I'm considering an alternative syntax for suppressing offset declaration:

l = data.layout{
  { 'x', 1 }, -- most significant bit
  { 3 },      -- 3 bits of padding
  { 'y', 4 } -- 4 subsequent bits
}

> For purposes of binary transfer, host endianism is unimportant; what
> matters is the endianism of the wire format.  TCP/IP uses big-endian
> format by definition.  ISTM that should be your default, too, else the
> same code compiled on two different machines means two different
> things.

It is not my intent to support network-only applications. In fact, one
of my use cases should be the support for writing device drivers in
Lua.

> A 2-byte integer starting at a 5-bit offset is weird for a
> byte-addressable machine.  I don't see a need to support bitfields
> unless you have an existing use case; bit arrays can always be
> transmitted as character arrays, which after all is how they appear in
> memory.

Sure that it *is* weird. But how we should handle data that is
structured in that way?

> By "alignment" I was asking about padding and offsets in data
> structures that the C language leaves up to the implementation.

You have to describe exactly the layout you want to access. If you
don't describe a specific offset, you cannot reach it. In the above
example, it has an explicitly padding declaration. Then, you can do
d:layout(l) and d.x or d.y, but you cannot reach that 3 bits padding.
Same is true if you omit offset ranges (e.g., everything after the
first byte is inaccessible using that layout).

> For your extract format ("fmt"), you might want to consider the gdb
> x/fmt command because it encompasses everything you could need and is
> the soul of brevity.

It sounds like a good tip =).

> As far as I can tell, by the way, you're reinventing part of ASN.1.
> Nothing wrong with that, in and of itself; perhaps you can create
> something more convenient to use.  But you might want to use it as a
> reference for functionality, and be ready to explain why your library
> should be used instead.

I really don't think that I'm reinventing ASN.1 nor BER. I'm just
designing a little API to handle binary data in Lua, not a standard.
Note, I'm using Lua tables to describe data layouts, not another
syntax notation (like ASN.1).

Regards,
-- 
Lourival Vieira Neto


Home | Main Index | Thread Index | Old Index