Subject: bin/19832: /bin/sh has internationalization issues
To: None <>
From: Martin Husemann <>
List: netbsd-bugs
Date: 01/13/2003 10:23:40
>Number:         19832
>Category:       bin
>Synopsis:       /bin/sh has internationalization issues
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 13 01:24:00 PST 2003
>Originator:     Martin Husemann
>Release:        NetBSD 1.6L
System: NetBSD 1.6L NetBSD 1.6L (PORTER) #0: Sat Jan 4 12:45:09 CET 2003 i386
Architecture: i386
Machine: i386

The /bin/sh code uses two magic constants generated by mksyntax: PEOF and UPEOF.
They need to be identical, but PEOF seems to be an integer, while UPEOF needs
to be the same value but as a char. UPEOF is never used directly, but PEOF
is and some macros (generated by mksyntax too) test against UPEOF.

PEOF is used as a out-of-band character in a zero terminated char* buffer, for
example to mark the end of a here-document. This means PEOF must be != '\0'
and no valid character inside a here document.

No such character exists, IMHO.

The arbitrary value chosen for PEOF right now is a valid printable character
in some locales on machines where unsigned chars are used if char == unsigned
char. It's an non printable/non alpha character if char == signed char in all
locales I know of, but there is no guarantee for this property - and I'm not
sure if this would forbid the character to occur in here documents.

code inspection

Rotottile the code passing lengths around instead of relying on sentinels?
Maybe do it completely and make it multi-byte character safe?