NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/59395: sh mangles some UTF-8 input



The following reply was made to PR bin/59395; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: bin/59395: sh mangles some UTF-8 input
Date: Mon, 05 May 2025 01:16:23 +0700

     Date:        Sun,  4 May 2025 15:25:00 +0000 (UTC)
     From:        campbell+netbsd%mumble.net@localhost
     Message-ID:  <20250504152500.D69BC1A923F%mollari.NetBSD.org@localhost>
 
   | This may have already been fixed in 10 --
 
 It very likely has, but I will check it.
 
   | But it's possible the underlying issue is, say, ctype(3) abuse,
 
 It isn't likely to be that, sh barely uses <ctype.h> (and probably
 should be fixed to not use it at all).
 
   | So I'd like to make sure the root cause is understood and resolved
   | before we close this PR.
 
 When that happened (assuming it has been fixed) the cause is clear,
 sh uses byte values in the range 0x81..0x8x (for some x I can't remember)
 in the input stream to indicate what is there, when it isn't just a
 character (0x81 is the rough analog of \ in the input for example,
 another byte indicates a var expansion, and another a command sub,
 etc).
 
 When one of those bytes appears as input (as in <c3 81>) sh escapes
 the byte concerned (0x81 in this case) with its internal escape char
 (0x81 ...) which is where the <c3 81 81> and <c3 81 82> originate.
 
 All of that is exactly as intended.
 
 The bug would have been in not removing the escape char before
 allowing the data to escape from sh.   There were problems in that
 area, but I believe (hope) they were all fixed some time ago now.
 
 I will check it again.
 
 kre
 


Home | Main Index | Thread Index | Old Index