NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/57616: sed(1) is unable to process multibyte unicode characters properly



The following reply was made to PR bin/57616; it has been noted by GNATS.

From: "Fege, Marc Daniel" <marc.fege%uni-bonn.de@localhost>
To: gnats-bugs%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc: 
Subject: Re: bin/57616: sed(1) is unable to process multibyte unicode
 characters properly
Date: Mon, 11 Sep 2023 17:40:15 +0200

 --_=_swift_1694446815_6a72f414bbd60b0874d6b9b6c0c01343_=_
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: quoted-printable
 
 Hello Michael,
 
 
 thank's a lot for your quick reply.
 
 
 > Wide char support ("NLS") from FreeBSD was integrated in 2021 and
 > will be in NetBSD-10.
 
 That's fantastic news. So it seems, that I'm a little late with my bug
 report, then, even though, 9.3 is the most recent stable release, and
 the issue is at least valid for 9.x the branch.
 
 However: are there plans to backport that stuff to a possible NetBSD
 9.4 or do we actually have to wait for possible 10 release?
 
 
 > It's not actually about sed failing but what the underlying regexp
 > library can do.
 
 Due to the fact, that I'm just an ordinary user, not a developer in
 any way, I was unable to state details beyond surface level
 diagnostic. What I see as a user as frontend of all of that underlying
 stuff is just a program called sed(1). That's why I was referring to
 it in a certain use case.
 
 Thank's alot!
 =20
 Am Montag, den 11.09.2023 um 17:05 schrieb mlelstv%serpens.de@localhost (michael
 van elst):
 
 
 
 The following reply was made to PR bin/57616; it has been noted by
 GNATS.
 
 From: mlelstv%serpens.de@localhost (Michael van Elst)
 To: gnats-bugs%netbsd.org@localhost
 Cc:=20
 Subject: Re: bin/57616: sed(1) is unable to process multibyte unicode
 characters properly
 Date: Mon, 11 Sep 2023 15:03:24 -0000 (UTC)
 
 marc.fege%uni-bonn.de@localhost writes:
 
 >NetBSD rpi 9.3 NetBSD 9.3 (RPI) #0: Thu Aug=C2=A0=C2=A04 15:30:37 UTC
 2022=C2=A0=C2=A0mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/evbarm/compile=
 /RPI
 evbarm
 
 >sed(1) has a problem processing multibyte unicode characters
 properly.
 
 >=C2=A0=C2=A0=C2=A0=C2=A0 echo "abc???xyz" | sed 's/./& /g'
 >I expect the following output format for further processing:
 >=C2=A0=C2=A0=C2=A0=C2=A0 "a b c ? ? ? x y z "
 
 
 It's not actually about sed failing but what the underlying regexp
 library can do.
 
 Wide char support ("NLS") from FreeBSD was integrated in 2021 and
 will be in NetBSD-10.
 
 --_=_swift_1694446815_6a72f414bbd60b0874d6b9b6c0c01343_=_
 Content-Type: text/html; charset=UTF-8
 Content-Transfer-Encoding: quoted-printable
 
 <html>
 <head>
 <style type=3D"text/css" id=3D"groupoffice-email-style">
 h6 {
   font-size: 11px;
   line-height: 14px;
   font-weight: bold;
   color: var(--fg-secondary-text);
 }
 h4 {
   font-size: 14px;
   line-height: 21px;
   letter-spacing: 0.4px;
   color: var(--fg-text);
   font-weight: normal;
 }
 h5 {
   font-size: 12px;
   color: var(--fg-secondary-text);
   font-weight: normal;
 }
 h3 {
   font-size: 16px;
   line-height: 21px;
   font-weight: normal;
   letter-spacing: 0.6px;
   color: var(--fg-base);
 }
 h2 {
   font-size: 21px;
   line-height: 28px;
   font-weight: normal;
   letter-spacing: 0.6px;
   color: var(--fg-base);
 }
 h1 {
   font-size: 30px;
   line-height: 35px;
   font-weight: normal;
   letter-spacing: 0.6px;
   color: var(--fg-base);
 }
 body, p, span, div {
   font-family: Helvetica, Arial, sans-serif;
   font-size: 14px;
   color: var(--fg-text);
   font-weight: normal;
   line-height: 21px;
   background-color: white;
 }
 @media screen and (max-device-width: 1200px) {
   body, p, span, div {
     font-size: 16px;
     line-height: 24px;
   }
 }
 code {
   border: 1px solid var(--fg-line);
   background-color: var(--bg-background);
   padding: 7px;
   margin: 14px 0;
   display: block;
   font-family: "Courier New", Courier, monospace;
   color: var(--fg-base);
   border-radius: 3.5px;
 }
 ul {
   display: block;
   list-style-type: disc;
   list-style-position: outside;
   margin: 0;
   padding: 0 0 0 2em;
 }
 ul > ul {
   list-style-type: circle;
 }
 ul > ul > ul {
   list-style-type: square;
 }
 ol {
   display: block;
   list-style-type: decimal;
   list-style-position: outside;
   margin: 0;
   padding: 0 0 0 2em;
 }
 ol > ol {
   list-style-type: lower-alpha;
 }
 ol > ol > ol {
   list-style-type: lower-roman;
 }
 </style>
 </head>
 <body><style></style>Hello Michael,<br><div><br></div><div>thank's a lot =
 for your quick reply.<br></div><div><br></div><div>&gt; Wide char suppor=
 t ("NLS") from FreeBSD was integrated in 2021 and<br>&gt; will be in Net=
 BSD-10.<style></style></div><div><br></div><div>That's fantastic news. S=
 o it seems, that I'm a little late with my bug report, <style></style>th=
 en, even though, 9.3 is the most recent stable release, and the issue is=
  at least valid for 9.x the branch.<br></div><div>However: are there pla=
 ns to backport that stuff to a possible NetBSD 9.4 or do we actually hav=
 e to wait for possible 10 release?<br></div><div><br></div><div>&gt; It'=
 s not actually about sed failing but what the underlying regexp<br>
  &gt; library can do.<style></style></div><div><br></div><div>Due to the =
 fact, that I'm just an ordinary user, not a developer in any way, I was =
 unable to state details beyond surface level diagnostic. What I see as a=
  user as frontend of all of that underlying stuff is just a program call=
 ed sed(1). That's why I was referring to it in a certain use case.</div>=
 <div><br></div><div>Thank's alot!<br><style></style></div>
 <br>Am Montag, den 11.09.2023 um 17:05 schrieb <a href=3D"mailto:mlelstv@se=
 rpens.de" class=3D"normal-link normal-link-email" target=3D"_blank" rel=3D=
 "noopener noreferrer">mlelstv%serpens.de@localhost</a> (michael van elst):<br><blo=
 ckquote style=3D"border:0;border-left: 2px solid #22437f; padding:0px; mar=
 gin:0px; padding-left:5px; margin-left: 5px; "><div class=3D"msg">The foll=
 owing reply was made to PR bin/57616; it has been noted by GNATS.<br>
 <br>
 From: <a class=3D"normal-link" href=3D"mailto:mlelstv%serpens.de@localhost";>mlelstv=
 @serpens.de</a> (Michael van Elst)<br>
 To: <a class=3D"normal-link" href=3D"mailto:gnats-bugs%netbsd.org@localhost";>gnats-=
 bugs%netbsd.org@localhost</a><br>
 Cc: <br>
 Subject: Re: bin/57616: sed(1) is unable to process multibyte unicode cha=
 racters properly<br>
 Date: Mon, 11 Sep 2023 15:03:24 -0000 (UTC)<br>
 <br>
  <a class=3D"normal-link" href=3D"mailto:marc.fege%uni-bonn.de@localhost";>marc.fege=
 @uni-bonn.de</a> writes:<br>
  <br>
  &gt;NetBSD rpi 9.3 NetBSD 9.3 (RPI) #0: Thu Aug&nbsp;&nbsp;4 15:30:37 UT=
 C 2022&nbsp;&nbsp;<a href=3D"mailto:mkrepro%mkrepro.NetBSD.org@localhost"; class=3D=
 "normal-link normal-link-email" target=3D"_blank" rel=3D"noopener norefe=
 rrer">mkrepro%mkrepro.NetBSD.org@localhost</a>:/usr/src/sys/arch/evbarm/compile/RP=
 I evbarm<br>
  <br>
  &gt;sed(1) has a problem processing multibyte unicode characters properl=
 y.<br>
  <br>
  &gt;&nbsp;&nbsp;&nbsp;&nbsp; echo "abc???xyz" | sed 's/./&amp; /g'<br>
  &gt;I expect the following output format for further processing:<br>
  &gt;&nbsp;&nbsp;&nbsp;&nbsp; "a b c ? ? ? x y z "<br>
  <br>
  <br>
  It's not actually about sed failing but what the underlying regexp<br>
  library can do.<br>
  <br>
  Wide char support ("NLS") from FreeBSD was integrated in 2021 and<br>
  will be in NetBSD-10.</div></blockquote></body></html>
 
 --_=_swift_1694446815_6a72f414bbd60b0874d6b9b6c0c01343_=_--
 



Home | Main Index | Thread Index | Old Index