tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: regex change



In article <CAJgzZorYDWZWYur9wGGdpLoCXeBJNXepmKBioUYXSXDu-jKp3w%mail.gmail.com@localhost>,
enh  <enh%google.com@localhost> wrote:
>-=-=-=-=-=-
>
>i see (having synced the current NetBSD lib/libc/regex to Android) that
>regcomp() no longer allows unescaped `{` and `}`. this is technically
>correct (since POSIX explicitly calls this undefined behavior), but it's a
>change from historical NetBSD behavior.
>
>specifically (since this was the existing Android test that failed) this is
>now rejected:
>
>{\n 1 : 2,\n 3 : -7,\n -1 : 1,\n -2 : {(0x[0-9a-f]{2},
>){31}0x[0-9a-f]{2}},\n -3 : {(0x[0-9a-f]{2}, ){31}0x[0-9a-f]{2}},\n }
>
>while this is of course fine:
>
>\\{\n 1 : 2,\n 3 : -7,\n -1 : 1,\n -2 : \\{(0x[0-9a-f]{2},
>){31}0x[0-9a-f]{2}\\},\n -3 : \\{(0x[0-9a-f]{2}, ){31}0x[0-9a-f]{2}\\},\n
>\\}
>
>but the former used to be interpreted as the latter :-)
>
>macOS (which also has a BSD-based libc) seems to still allow the former
>(but they might just be lagging behind, like Android was?).
>
>glibc does not allow it.
>
>i don't yet have any data on app compat failures, just this one unit test
>for the OS itself so far, but i'm curious --- was this a _deliberate_
>behavior change, or is this a surprise?
>
>i don't have a plan for Android yet, and i'll probably not have one
>until/unless we do see more than one test hit this in practice, but i'm
>trying to think ahead about what my options might be. i'd be interested to
>know whether -- if it came to it -- you'd consider a patch to restore the
>old behavior. or whether you consider this change in NetBSD's behavior to
>be a bug in its own right. alternatively, i can always have a "what version
>of the OS were you expecting to run on?" check and offer both behaviors for
>a few years before retiring the old behavior (because the Play Store
>requires that you move forward with your OS version support eventually).

This was done as part of syncing the NetBSD regex code with FreeBSD's to
get utf8 support. With it came support for GNU regex extensions (\b\s\w etc),
which are easier to implement if escaped ordinary characters are expected
to behave the same way as unescaped ones:

    https://github.com/freebsd/freebsd-src/commit/adeebf4cd47c3e85155d92f386bda5e519b75ab2

Best,

christos



Home | Main Index | Thread Index | Old Index