NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/59395: sh mangles some UTF-8 input



>Number:         59395
>Category:       bin
>Synopsis:       sh mangles some UTF-8 input
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun May 04 15:25:00 +0000 2025
>Originator:     Taylor R Campbell
>Release:        9
>Organization:
The AccentBSD Shell Co., Inc.
>Environment:
>Description:
The following shell script with non-US-ASCII, UTF-8 input in a heredoc mangles the data en route to cat:

begin-base64 644 bad.sh
IyEvYmluL3NoCgpjYXQgPDxFT0YKZGlmZiAtciA2ZWE2MmMyNTVjZDkgLXIgZjQx
NDkwMmJjYmIzIMOBLWxhLWNhcnRlL3DDonRpc3NlcmllL2NsYWZvdXRpcwotLS0g
L2Rldi9udWxsCVRodSBKYW4gMDEgMDA6MDA6MDAgMTk3MCArMDAwMAorKysgYi/D
gS1sYS1jYXJ0ZS9ww6J0aXNzZXJpZS9jbGFmb3V0aXMJVGh1IEphbiAwMSAwMDow
MDowMiAxOTcwICswMDAwCkBAIC0wLDAgKzEsMSBAQAorY8OpcmlzZQpkaWZmIC1y
IDZlYTYyYzI1NWNkOSAtciBmNDE0OTAyYmNiYjMgw6EtbGEtY2FydGUvUMOCVElT
U0VSSUUvbW91c3NlCi0tLSAvZGV2L251bGwJVGh1IEphbiAwMSAwMDowMDowMCAx
OTcwICswMDAwCisrKyBiL8OhLWxhLWNhcnRlL1DDglRJU1NFUklFL21vdXNzZQlU
aHUgSmFuIDAxIDAwOjAwOjAyIDE5NzAgKzAwMDAKQEAgLTAsMCArMSwxIEBACitm
cmFpc2UKZGlmZiAtciA2ZWE2MmMyNTVjZDkgLXIgZjQxNDkwMmJjYmIzIMOhLWxh
LWNhcnRlL2JvdWxhbmdlcmllL0Nyb2lzc2FudAotLS0gL2Rldi9udWxsCVRodSBK
YW4gMDEgMDA6MDA6MDAgMTk3MCArMDAwMAorKysgYi/DoS1sYS1jYXJ0ZS9ib3Vs
YW5nZXJpZS9Dcm9pc3NhbnQJVGh1IEphbiAwMSAwMDowMDowMiAxOTcwICswMDAw
CkBAIC0wLDAgKzEsMSBAQAorZXBpbmFyZApkaWZmIC1yIDZlYTYyYzI1NWNkOSAt
ciBmNDE0OTAyYmNiYjMgw6EtbGEtY2FydGUvZW50csOpZS9zb3VwZQotLS0gL2Rl
di9udWxsCVRodSBKYW4gMDEgMDA6MDA6MDAgMTk3MCArMDAwMAorKysgYi/DoS1s
YS1jYXJ0ZS9lbnRyw6llL3NvdXBlCVRodSBKYW4gMDEgMDA6MDA6MDIgMTk3MCAr
MDAwMApAQCAtMCwwICsxLDEgQEAKK2NoYW1waWdub24KZGlmZiAtciA2ZWE2MmMy
NTVjZDkgLXIgZjQxNDkwMmJjYmIzIMOhLWxhLWNhcnRlL3DDonRpc3NlcmllL1ND
SExPU1NFUkJVQkVOCi0tLSAvZGV2L251bGwJVGh1IEphbiAwMSAwMDowMDowMCAx
OTcwICswMDAwCisrKyBiL8OhLWxhLWNhcnRlL3DDonRpc3NlcmllL1NDSExPU1NF
UkJVQkVOCVRodSBKYW4gMDEgMDA6MDA6MDIgMTk3MCArMDAwMApAQCAtMCwwICsx
LDEgQEAKK3ZpZW5ub2lzZQpkaWZmIC1yIDZlYTYyYzI1NWNkOSAtciBmNDE0OTAy
YmNiYjMgw6EtbGEtY2FydGUvcMOidGlzc2VyaWUvY3LDqG1lLWJyw7tsw6llCi0t
LSAvZGV2L251bGwJVGh1IEphbiAwMSAwMDowMDowMCAxOTcwICswMDAwCisrKyBi
L8OhLWxhLWNhcnRlL3DDonRpc3NlcmllL2Nyw6htZS1icsO7bMOpZQlUaHUgSmFu
IDAxIDAwOjAwOjAyIDE5NzAgKzAwMDAKQEAgLTAsMCArMSwxIEBACitzYWZyYW4K
RU9GCg==
====

Running

LC_ALL=C.UTF-8 sh bad.sh | diff -u bad.sh -

shows that:

- in LATIN CAPITAL LETTER A WITH ACUTE ACCENT, sh has replaced <c3 81> by <c3 81 81>;
- in LATIN CAPITAL LETTER A WITH CIRCUMFLEX, sh has replaced <c3 82> by <c3 81 82>.

I tried to reduce this test case further and the problem went away.
>How-To-Repeat:
as above
>Fix:
This may have already been fixed in 10 -- I can't reproduce it _with this test case_.  But it's possible the underlying issue is, say, ctype(3) abuse, which may be stochastic.  So I'd like to make sure the root cause is understood and resolved before we close this PR.



Home | Main Index | Thread Index | Old Index