tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: printf(1), sh(1), POSIX.2 and octal escape sequences



On Wed, Jun 28, 2023 at 12:45:55PM -0400, Mouse wrote:
> >>> "\ddd", where ddd is a one, two, or three-digit octal number, shall
> >>> be written as a byte with the numeric value specified by the octal
> >>> number."
> >> [...]
> > I beg to differ: since due to this very unfortunate "variable length"
> > feature, your scanner has to read char by char, it can reject the
> > third digit since it would yield an out of range byte value.
> Is the size of a `byte' specified anywhere?
Not in C, which allows just about anything
(incl. your DSP byte=char=short=int=32 bits example),
but POSIX defines bytes as 8-bit; quoth Issue 8 Draft 3
(this is long-standing, I just have that PDF open), XRAT A.3:
123777  Byte

123778  The restriction that a byte is now exactly eight bits was a conscious decision by the standard
123779  developers. It came about due to a combination of factors, primarily the use of the type int8_t
123780  within the networking functions and the alignment with the ISO/IEC 9899: 1999 standard,
123781  where the intN_t types were first defined.

123782  According to the ISO/IEC 9899: 1999 standard:

123783   The [u]intN_t types must be two’s complement with no padding bits and no illegal values.

123784   All types (apart from bit fields, which are not relevant here) must occupy an integral
123785    number of bytes.

123786   If a type with width W occupies B bytes with C bits per byte (C is the value of
123787    {CHAR_BIT}), then it has P padding bits where P+W=B∗C.

123788   Therefore, for int8_t P=0, W=8. Since B≥1, C≥8, the only solution is B=1, C=8.

123789  The standard developers also felt that this was not an undue restriction for the current state-of-
123790  the-art for this version of the standard, but recognize that if industry trends continue, a wider
123791  character type may be required in the future.

And similarly XBD, <limits.h> says
10172     {CHAR_BIT}
10173     Number of bits in a type char.
10174 CX  Value: 8
(where "CX" shading indicates "Extension to the ISO C standard").

Funnily, one place in the teletype definitions still uses "bits per byte"
instead of "bits per character" as a historical artifact.

uudecode is defined as undefined if the encoder and decoder have
different byte widths.

Best,
наб

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index