tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: regex, signed chars and 0x80 to 0xFF
On Sun, Dec 22, 2024 at 12:38:38AM +0100, Roland Illig wrote:
> Am 21.12.2024 um 20:21 schrieb tlaronde%kergis.com@localhost:
> > Problem found: NetBSD's flex doesn't understand the idiom but is used,
> > while (pkgsrc) flex-2.6.4 is able to interpret such regex.
> >
> > Nonetheless, has someone information about the handling of negative
> > chars in a regex (there are discussions, here and there, about
> > problems caused by a character as '-1' mistaken for EOF, so it seems
> > that the regex should always use unsigned char. But I guess a majority
> > of the code uses char, and only the ASCII range (whether these values
> > represent ASCII chars or not) i.e. positive values are safe.
>
> The flex message "negative range in character class" means that in a
> character range [F-L], F is greater than L. It's not about negative
> character numbers but about a backwards range.
>
> I tried to reproduce the problem by running external/bsd/flex/bin/lex
> from the netbsd-8 branch, compiled on NetBSD 10.99.x, on
> https://raw.githubusercontent.com/wine-mirror/wine/wine-5.0.5/dlls/msxml3/xslpattern.l,
> but everything went fine.
>
> If you have the NetBSD 8 source at hand, you could add a printf
> statement right above the "negative range" message in
> external/bsd/flex/dist/src/parse.y:
> > if ($2 > $4) fprintf(stderr, "from %d to %d\n", $2, $4);
>
> Then, rebuild flex and run it on the file. I'm curious whether you can
> reproduce the message, and what the actual character numbers are that
> are backwards.
I'm on:
NetBSD cauchy.polynum.local 10.99.12 NetBSD 10.99.12 (CAUCHY) #2: Thu Sep 26 16:02:35 CEST 2024 tlaronde@cauchy.polynum.local:/data/m/netbsd-10.99/sys/arch/amd64/compile/CAUCHY amd64
I added the instruction in the current sources and recompiled with
USETOOLS=no, hence using what is installed, and the problem doesn't
appear!???
What is installed is:
/usr/bin/lex: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /usr/libexec/ld.elf_so, for NetBSD 10.99.12, not stripped
Dated from:
-r-xr-xr-x 3 root wheel 467024 Oct 5 14:49 /usr/bin/lex
(kernel is from 27 September)
What is compiled is:
./lex: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /usr/libexec/ld.elf_so, for NetBSD 10.99.12, not stripped
Details:
-rwxr-xr-x 1 tlaronde wheel 467048 Dec 22 08:55 ./lex
Note: I have removed the added printf, so the source should be the
same but there are 24 bytes of difference in the result.
But I don't see in the git log a modification made between october and
now on flex, and I'm compiling using the installed tools, headers and
libraries, that should have been the ones used (doing a complete
compilation, i.e. recompiling the tools) for building userland...
I'm puzzled...
Obviously, I will have to rebuild userland. But I'd like to understand
what caused the difference.
Thanks for the tips!
--
Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
http://www.kergis.com/
http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Home |
Main Index |
Thread Index |
Old Index