tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: regex change



On Thu, Nov 10, 2022 at 7:44 AM Christos Zoulas <christos%astron.com@localhost> wrote:

> In article <
> CAJgzZorYDWZWYur9wGGdpLoCXeBJNXepmKBioUYXSXDu-jKp3w%mail.gmail.com@localhost>,
> enh  <enh%google.com@localhost> wrote:
> >-=-=-=-=-=-
> >
> >i see (having synced the current NetBSD lib/libc/regex to Android) that
> >regcomp() no longer allows unescaped `{` and `}`. this is technically
> >correct (since POSIX explicitly calls this undefined behavior), but it's a
> >change from historical NetBSD behavior.
> >
> >specifically (since this was the existing Android test that failed) this
> is
> >now rejected:
> >
> >{\n 1 : 2,\n 3 : -7,\n -1 : 1,\n -2 : {(0x[0-9a-f]{2},
> >){31}0x[0-9a-f]{2}},\n -3 : {(0x[0-9a-f]{2}, ){31}0x[0-9a-f]{2}},\n }
> >
> >while this is of course fine:
> >
> >\\{\n 1 : 2,\n 3 : -7,\n -1 : 1,\n -2 : \\{(0x[0-9a-f]{2},
> >){31}0x[0-9a-f]{2}\\},\n -3 : \\{(0x[0-9a-f]{2}, ){31}0x[0-9a-f]{2}\\},\n
> >\\}
> >
> >but the former used to be interpreted as the latter :-)
> >
> >macOS (which also has a BSD-based libc) seems to still allow the former
> >(but they might just be lagging behind, like Android was?).
> >
> >glibc does not allow it.
> >
> >i don't yet have any data on app compat failures, just this one unit test
> >for the OS itself so far, but i'm curious --- was this a _deliberate_
> >behavior change, or is this a surprise?
> >
> >i don't have a plan for Android yet, and i'll probably not have one
> >until/unless we do see more than one test hit this in practice, but i'm
> >trying to think ahead about what my options might be. i'd be interested to
> >know whether -- if it came to it -- you'd consider a patch to restore the
> >old behavior. or whether you consider this change in NetBSD's behavior to
> >be a bug in its own right. alternatively, i can always have a "what
> version
> >of the OS were you expecting to run on?" check and offer both behaviors
> for
> >a few years before retiring the old behavior (because the Play Store
> >requires that you move forward with your OS version support eventually).
>
> This was done as part of syncing the NetBSD regex code with FreeBSD's to
> get utf8 support. With it came support for GNU regex extensions (\b\s\w
> etc),
> which are easier to implement if escaped ordinary characters are expected
> to behave the same way as unescaped ones:
>
>
> https://github.com/freebsd/freebsd-src/commit/adeebf4cd47c3e85155d92f386bda5e519b75ab2


ah, thanks for that link. stupidly, although i'd seen that the NetBSD
changes were syncing with FreeBSD, i didn't go to look at the original
FreeBSD commits.

cool, that sounds like i (a) have a clear "why" argument should anyone ask,
and (b) a ready-made `PFLAG_LEGACY_ESC` escape hatch for backwards
compatibility.

i'll be interested to see what iOS/macOS does here (because ideally Android
and iOS would do the _same_ thing so there's less for mobile developers to
worry about!). oh... looks like they use TRE instead?
https://opensource.apple.com/source/Libc/Libc-1439.141.1/regex/
interesting; i thought it was only musl that used TRE behind the scenes.

anyway, thanks!


>
> Best,
>
> christos
>
>


Home | Main Index | Thread Index | Old Index