Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[src/trunk]: src/lib/libc/gen More fixes from: J.R. Oldroyd



details:   https://anonhg.NetBSD.org/src/rev/2cf6d6271c02
branches:  trunk
changeset: 784949:2cf6d6271c02
user:      christos <christos%NetBSD.org@localhost>
date:      Fri Feb 15 00:28:10 2013 +0000

description:
More fixes from: J.R. Oldroyd
- The input loop control that I changed yesterday to:
        while (mbslength >= 0) {
  There are circumstances where this causes an extra \000 to
  be added at the end of some tests.  This error was showing
  in my own tests here, but I did not notice it yesterday.
  (I really need to add my tests to the test suite, catching
  every error by eye is hard.)  To fix, I've now changed the
  code to increment mbslength only if mbslength == 1 to start
  with.  (Note that this check for "== 1" is why the arg to
  strvisx() in vis(1) must be 1, not mbilen.)

- The cast sequence when manually inserting bytes after a
  multibyte conversion error:
        *src = (wint_t)(u_char)*mbsrc;
  is wrong.  This is causing problems in the case when an
  8859-1 input string is processed in the UTF-8 locale.
  It needs to be:
        *src = (wint_t)*mbsrc;
  Without the (u_char) all the locale mismatch combinations
  then work.

- The code:
        if (mblength < len)
                len = mblength;
  needs to be there.  It resets len for the single character
  input case after we've actually processed two input
  characters (c and nextc) because we incremented mbslength
  at the start of the loop.  Without this code, single
  character conversions end up with a \000 or other byte
  appended.

diffstat:

 lib/libc/gen/vis.c |  61 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 57 insertions(+), 4 deletions(-)

diffs (121 lines):

diff -r c069239a870f -r 2cf6d6271c02 lib/libc/gen/vis.c
--- a/lib/libc/gen/vis.c        Fri Feb 15 00:13:06 2013 +0000
+++ b/lib/libc/gen/vis.c        Fri Feb 15 00:28:10 2013 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: vis.c,v 1.52 2013/02/14 13:57:53 christos Exp $        */
+/*     $NetBSD: vis.c,v 1.53 2013/02/15 00:28:10 christos Exp $        */
 
 /*-
  * Copyright (c) 1989, 1993
@@ -57,7 +57,7 @@
 
 #include <sys/cdefs.h>
 #if defined(LIBC_SCCS) && !defined(lint)
-__RCSID("$NetBSD: vis.c,v 1.52 2013/02/14 13:57:53 christos Exp $");
+__RCSID("$NetBSD: vis.c,v 1.53 2013/02/15 00:28:10 christos Exp $");
 #endif /* LIBC_SCCS and not lint */
 #ifdef __FBSDID
 __FBSDID("$FreeBSD$");
@@ -298,6 +298,20 @@
        _DIAGASSERT(mbsrc != NULL);
        _DIAGASSERT(mbextra != NULL);
 
+       /*
+        * Input (mbsrc) is a char string considered to be multibyte
+        * characters.  The input loop will read this string pulling
+        * one character, possibly multiple bytes, from mbsrc and
+        * converting each to wchar_t in src.
+        *
+        * The vis conversion will be done using the wide char
+        * wchar_t string.
+        *
+        * This will then be converted back to a multibyte string to
+        * return to the caller.
+        */
+
+       /* Allocate space for the wide char strings */
        psrc = pdst = extra = nextra = NULL;
        if (!mblength)
                mblength = strlen(mbsrc);
@@ -312,22 +326,53 @@
        dst = pdst;
        src = psrc;
 
+       /*
+        * Input loop.
+        * Handle up to mblength characters (not bytes).  We do not
+        * stop at NULs because we may be processing a block of data
+        * that includes NULs.  We process one more than the character
+        * count so that we also get the next character of input which
+        * is needed under some circumstances as a look-ahead character.
+        */
        mbslength = (ssize_t)mblength;
-       while (mbslength >= 0) {
+       /*
+        * When inputing a single character, must also read in the
+        * next character for nextc, the look-ahead character.
+        */
+       if (mbslength == 1)
+               mbslength++;
+       while (mbslength > 0) {
+               /* Convert one multibyte character to wchar_t. */
                clen = mbtowc(src, mbsrc, MB_LEN_MAX);
                if (clen < 0) {
-                       *src = (wint_t)(u_char)*mbsrc;
+                       /* Conversion error, process as a byte instead. */
+                       *src = (wint_t)*mbsrc;
                        clen = 1;
                }
                if (clen == 0)
+                       /*
+                        * NUL in input gives 0 return value. process
+                        * as single NUL byte.
+                        */
                        clen = 1;
+               /* Advance output pointer if we still have input left. */
                src++;
+               /* Advance input pointer by number of bytes read. */
                mbsrc += clen;
+               /* Decrement input count */
                mbslength -= clen;
        }
        len = src - psrc;       
        src = psrc;
+       /*
+        * In the single character input case, we will have actually
+        * processed two characters, c and nextc.  Reset len back to
+        * just a single character.
+        */
+       if (mblength < len)
+               len = mblength;
 
+       /* Convert extra argument to list of characters for this mode. */
        mbstowcs(extra, mbextra, strlen(mbextra));
        MAKEEXTRALIST(flag, nextra, extra);
        if (!nextra) {
@@ -340,8 +385,14 @@
                goto out;
        }
 
+       /* Look up which processing function to call. */
        f = getvisfun(flag);
 
+       /*
+        * Main processing loop.
+        * Call do_Xvis processing function one character at a time
+        * with next character available for look-ahead.
+        */
        for (start = dst; len > 0; len--) {
                c = *src++;
                dst = (*f)(dst, c, flag, len >= 1 ? *src : L'\0', nextra);
@@ -351,8 +402,10 @@
                }
        }
 
+       /* Terminate the output string. */
        *dst = L'\0';
 
+       /* Convert wchar_t string back to multibyte output string. */
        len = dlen ? *dlen : ((wcslen(start) + 1) * MB_LEN_MAX);
        olen = wcstombs(mbdst, start, len * sizeof(*mbdst));
 



Home | Main Index | Thread Index | Old Index