tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: A Library for Converting Data to and from C Structs for Lua
On Sun, Nov 24, 2013 at 10:06 PM, James K. Lowden
<jklowden%schemamania.org@localhost> wrote:
> On Sat, 23 Nov 2013 11:46:19 -0200
> Lourival Vieira Neto <lourival.neto%gmail.com@localhost> wrote:
>> On Sat, Nov 23, 2013 at 1:22 AM, James K. Lowden
>> <jklowden%schemamania.org@localhost> wrote:
>> > On Mon, 18 Nov 2013 09:07:52 +0100
>> > Marc Balmer <marc%msys.ch@localhost> wrote:
>> >
>> >> After discussion we lneto@ and others we realised that there are
>> >> several such libraries around, and that I as well as lneto@ wrote
>> >> one. SO we decided to merge our works
>> >
>> > How do you deal with the usual issues of alignment and endianism?
>>
>> d = data.new{0xF0, 0xFF, 0x00} -- creates a new data object with 3
>> bytes. d:layout{
>> x = { __offset = 0, __length = 3 },
>> y = { __offset = 8, __length = 16, __endian = 'net' },
>> z = { __offset = 0, __step = 9 }
>> }
>>
>> d.x -- returns the 3 most significant bits from d (that is, 7)
>> d.y -- returns 16 bits counting from bit-8 most significant.
>> -- in this case, these 2 bytes are converted using ntohs(3),
>> that is 0xFF00.
>> d.z[1] -- returns the 9 most significant bits from d (that is, 0x1E1).
>
> Hi Lourival,
>
> Thanks for your answer. A few questions and observations, if I may.
You are welcome. Of course you may =).
> 1. What is the significance of the leading underscores?
It is used as a mark to distinguish parameters from field names. It is
specially useful for nested fields and for the global behavior of the
layout. For example: d:layout{__endian = 'net', .... }, would use big
endian for all fields (except which has set it explicitly).
> 2. I assume you mean d.x represents the three *least* significant bits.
Why? I really meant three *most* significant bits. In this example:
[* 1 | 1 | 1 * | 1 | 0 | 0 | 0 | 0 ].
> I don't understand "step", not that it matters.
Sorry, I didn't describe the API itself; I just illustrated a little
example. The __step parameter is used for array accessing. In the
previous example, the d.z could be indexed using Lua array notation,
where each position corresponds to 9 bits from data, starting from
bit-0 most significant (__offset = 0 and __step = 9).
> For purposes of extracting/packing values in a buffer, offset and length are
> all you
> need.
In fact, you only *need* bit operations for that. However, I think
that could be more pleasant to have a declarative API for that.
> Semantics require a type system for the bit patterns. I guess "y" is
> implied to be a 16-bit integer, since it has endianism, but its
> signedness is unspecified. I suggest you enumerate all types you will
> support, and that that set encompass all types that a C compiler can
> generate.
I'm only handling integers.
> If you include an "ignore" type (cf. Perl's pack/unpack
> functions), you can drop "offset" from your description, for which
> you'll be glad eventually.
I'm considering an alternative syntax for suppressing offset declaration:
l = data.layout{
{ 'x', 1 }, -- most significant bit
{ 3 }, -- 3 bits of padding
{ 'y', 4 } -- 4 subsequent bits
}
> For purposes of binary transfer, host endianism is unimportant; what
> matters is the endianism of the wire format. TCP/IP uses big-endian
> format by definition. ISTM that should be your default, too, else the
> same code compiled on two different machines means two different
> things.
It is not my intent to support network-only applications. In fact, one
of my use cases should be the support for writing device drivers in
Lua.
> A 2-byte integer starting at a 5-bit offset is weird for a
> byte-addressable machine. I don't see a need to support bitfields
> unless you have an existing use case; bit arrays can always be
> transmitted as character arrays, which after all is how they appear in
> memory.
Sure that it *is* weird. But how we should handle data that is
structured in that way?
> By "alignment" I was asking about padding and offsets in data
> structures that the C language leaves up to the implementation.
You have to describe exactly the layout you want to access. If you
don't describe a specific offset, you cannot reach it. In the above
example, it has an explicitly padding declaration. Then, you can do
d:layout(l) and d.x or d.y, but you cannot reach that 3 bits padding.
Same is true if you omit offset ranges (e.g., everything after the
first byte is inaccessible using that layout).
> For your extract format ("fmt"), you might want to consider the gdb
> x/fmt command because it encompasses everything you could need and is
> the soul of brevity.
It sounds like a good tip =).
> As far as I can tell, by the way, you're reinventing part of ASN.1.
> Nothing wrong with that, in and of itself; perhaps you can create
> something more convenient to use. But you might want to use it as a
> reference for functionality, and be ready to explain why your library
> should be used instead.
I really don't think that I'm reinventing ASN.1 nor BER. I'm just
designing a little API to handle binary data in Lua, not a standard.
Note, I'm using Lua tables to describe data layouts, not another
syntax notation (like ASN.1).
Regards,
--
Lourival Vieira Neto
Home |
Main Index |
Thread Index |
Old Index