NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lib/58209: <cctype> lacks compile-time diagnostics for char abuse



>Number:         58209
>Category:       lib
>Synopsis:       <cctype> lacks compile-time diagnostics for char abuse
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 28 15:00:02 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, 9, ...
>Organization:
The NetBSD std::isfoundation
>Environment:
>Description:
The <cctype> functions, such as std::isprint/isdigit/isalpha and std::toupper/tolower, have a singularly troublesome specification: Their argument has type int, but they are only defined on inputs that are either (a) the value of the EOF macro (which on NetBSD is -1), or (b) representable by unsigned char.  In other words, there are exactly 257 allowed inputs: {-1, 0, 1, 2, 3, ..., 255}.  Any other inputs lead to undefined behaviour.

This is because they are meant for use with I/O functions like std::istream.peek:

int ch;
while ((ch = std::cin.peek()) != EOF) {
        if (std::isspace(ch))
                ...
}

Using them to process arbitrary contents of, e.g., std::string requires explicit conversion to unsigned char:

std::string s = ...;
for (i = 0; i < s.size(); i++) {
        if (std::isspace(static_cast<unsigned char>(s[i])))
                ...
}

Without this conversion, on machines where char is signed such as x86, char values outside the 7-bit US-ASCII range are either (a) undefined behaviour, or (b) in the case of the all-bits-set octet, conflated with EOF.

Our standard C <ctype.h> definitions are crafted to trigger the -Wchar-subscripts compiler warning, by defining, e.g., isspace(c) as a macro that expands into ((_ctype_tab_ + 1)[c] & bits).  But that doesn't work with C++; we can't expand `std::isspace(c)' into `std::((_ctype_tab_ + 1)[c] & bits)'.  So C++ code with ctype abuse (like https://github.com/ledger/ledger/issues/2340) gets no compile-time feedback, and bad runtime feedback (https://gnats.netbsd.org/58208) leading to simply confusing behaviour (like https://github.com/ledger/ledger/issues/2338).
>How-To-Repeat:
#include <cctype>
#include <string>

std::string s = {static_cast<char>(0xb5), 0;
std::cout << std::isspace(s[0]) << std::endl;
>Fix:
Maybe we can teach <cctype> to overload isspace &c., or find some template magic, that will trigger a warning at compile-time.



Home | Main Index | Thread Index | Old Index