Subject: lib/20873: UTF8 mbrtowc() doesn't return -1 when given illegal UTF8 sequence
To: None <gnats-bugs@gnats.netbsd.org>
From: None <khym@azeotrope.org>
List: netbsd-bugs
Date: 03/24/2003 04:03:05
>Number:         20873
>Category:       lib
>Synopsis:       UTF8 mbrtowc() doesn't return -1 when given illegal UTF8 sequence
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 24 02:04:00 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Dave Huang
>Release:        NetBSD-current as of March 23, 2003
>Organization:
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: khym@azeotrope.org |  they raise a paw / the bat, the cat /
FurryMUCK: Dahan         |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 27 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++
>Environment:
	
	
System: NetBSD fluff.azeotrope.org 1.6P NetBSD 1.6P (FLUFF) #10: Sun Mar  9 21:06:23 CST 2003  khym@fluff.azeotrope.org:/usr/obj.i386/FLUFF i386
Architecture: i386
Machine: i386
     $NetBSD: citrus_utf8.c,v 1.6 2002/03/28 10:53:49 yamt Exp $
>Description:
	mbrtowc(3) is supposed to return (size_t)-1 if it's given an
illegal sequence of multibyte characters. However, if the locale is
set to a UTF-8 locale, it doesn't set the return value at all and
returns garbage.
>How-To-Repeat:

#include <errno.h>
#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>

int main(void)
{
	/* 0xa7 can never be the first byte of a UTF-8 sequence */
	char s[] = "\xa7";
	wchar_t wc;
	mbstate_t mbstate;
	size_t r;

	setlocale(LC_ALL, "");

	/* initialize mbstate */
	r = mbrtowc(NULL, NULL, 0, &mbstate);

	r = mbrtowc(&wc, s, strlen(s), &mbstate);
	printf("mbrtowc returned %d, errno = %d\n", (int)r, errno);

	return 0;
}

% env LC_ALL=en_US.UTF-8 ./test_mbrtowc
mbrtowc returned 536973768, errno = 85

>Fix:
Index: citrus_utf8.c
===================================================================
RCS file: /cvsroot/src/lib/libc/citrus/modules/citrus_utf8.c,v
retrieving revision 1.6
diff -u -r1.6 citrus_utf8.c
--- citrus_utf8.c	2002/03/28 10:53:49	1.6
+++ citrus_utf8.c	2003/03/24 09:47:15
@@ -276,6 +276,7 @@
 
 ilseq:
 	psenc->chlen = 0;
+	*nresult = (size_t)-1;
 	return (EILSEQ);
 
 restart:
>Release-Note:
>Audit-Trail:
>Unformatted: