tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: printf(1), sh(1), POSIX.2 and octal escape sequences



Le Wed, Jun 28, 2023 at 06:58:57PM +0200, Roland Illig a écrit :
> Am 28.06.2023 um 12:57 schrieb tlaronde%polynum.com@localhost:
> > But isn't it incorrect? POSIX 2018 says:
> >
> > '"\ddd", where ddd is a one, two, or three-digit octal number, shall be
> > written as a byte with the numeric value specified by the octal number.'
> 
> The main intended takeaway from this sentence is that \0000 is not a
> single escape sequence but rather the escape sequence '\000' followed by
> the digit '0'. That's a different to the hex escape sequence introduced
> by C90, which allows an arbitrary number of digits, so '\x0000000012'
> forms a single escape sequence.
> 
> That sentence defines that '\778' is parsed as '\77' followed by the
> digit '8', as '8' is not an octal digit.
> 
> That sentence also says that '\777' is parsed as a single escape
> sequence (due to the common lexer rule that at each time, the longest
> possible token is matched), as '777' is a syntactically valid octal
> number. The range constraints are usually not expressed in the grammar,
> they are left to another layer of the parser or interpreter instead.
> 
> So '\778' should be parsed as '\77' followed by '8', and '\777' should
> be parsed as '\777' and then rejected as out of range, just like a port
> number 70000 is rejected as well.

OK for the interpretation linked to the lexer. But as for the "reject",
POSIX says nothing, and the result is simply truncated to 8 bits.

The devil is indeed in the details...
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index