NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: lib/47983: libedit segfault at character decoding error (sh autocomplete)



The following reply was made to PR lib/47983; it has been noted by GNATS.

From: Matthew Mondor <mm_lists%pulsar-zone.net@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: lib/47983: libedit segfault at character decoding error (sh
 autocomplete)
Date: Wed, 3 Jul 2013 01:39:12 -0400

 On Mon,  1 Jul 2013 22:05:00 +0000 (UTC)
 Matthew Mondor <mm_lists%pulsar-zone.net@localhost> wrote:
 
 > >Fix:
 >=20
 > The following diff fixes the problem for me, with the following result:
 >=20
 > $ ls z\U+00E9=20
 > z?
 > $=20
 >=20
 > Index: lib/libedit/chared.c
 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 > RCS file: /data/rsync/netbsd-cvs/src/lib/libedit/chared.c,v
 > retrieving revision 1.36
 > diff -u -r1.36 chared.c
 > --- lib/libedit/chared.c     23 Oct 2011 17:37:55 -0000      1.36
 > +++ lib/libedit/chared.c     1 Jul 2013 20:18:23 -0000
 > @@ -612,6 +612,10 @@
 >  {
 >      size_t len;
 > =20
 > +    /* String may be NULL, as in the case of a character decoding error
 > +     */
 > +    if (s =3D=3D NULL)
 > +            return -1;
 >      if ((len =3D Strlen(s)) =3D=3D 0)
 >              return -1;
 >      if (el->el_line.lastchar + len >=3D el->el_line.limit) {
 
 Actually, this doesn't work as well as intended.  Interestingly, if
 testing using /rescue/sh, the above works fine, but if using /bin/sh,
 the above results in no characters being supplied (although there is no
 more crash, at least).
 
 I have the impression that the proper way to solve this would be to
 support UTF-8B, a variant of UTF-8 where invalid sequences of octets
 are imported using the UTF-16 surrogate range (D800=E2=80=93DBFF, DC00=E2=
 =80=93DFFF),
 such as DC80-DCFF.  The decoder would also output that special range to
 the original octets.  This would allow non-destructive
 decoding+encoding cycles and prevent fatal decoding errors, providing a
 more transparent and reliable interface.
 
 As a quicker solution, I'm tempted to convert invalid UTF-8 sequence
 octets to wchar_t implicitely by assuming they are LATIN-*.
 
 Also remains to decide if that should be done in libedit or in the C
 library...  Being able to error on invalid sequences at decoding time
 can be considered a feature, but it should ideally not be the only
 option.  Unfortunately, I think that the current wchar related
 interface does not allow passing a flag for such an option?  Perhaps
 the flag can be part of the locale definition though...
 
 What could be done too would be providing another function to decode
 strings from char to wchar_t using UTF-8B or implicit LATIN-* coertion,
 and have libedit call it if ct_decode_string() returns NULL.  Afterall,
 it's input routines which commonly have to deal with potentially
 invalid octet sequences, and that's what libedit does...
 --=20
 Matt
 



Home | Main Index | Thread Index | Old Index