NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
lib/60111: unvis() S_NUMBER Numeric HTML Entity Decoding Issue
>Number: 60111
>Category: lib
>Synopsis: unvis() S_NUMBER Numeric HTML Entity Decoding Issue
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Mar 20 10:10:00 +0000 2026
>Originator: Yeo Jong Han
>Release: libbsd v1.44
>Organization:
NIL
>Environment:
>Description:
| Field | Value |
|-------|-------|
| **Type** | Logic Error / Incorrect Computation / Integer Overflow |
| **Severity** | HIGH |
| **CVSS v3.1** | 7.5 (AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N) |
| **Affected File** | `src/unvis.c`, line 471 |
| **Affected Functions** | `unvis()`, `strunvis()`, `strunvisx()`, `strnunvisx()` |
| **Affected Versions** | All versions of libbsd (bug inherited from NetBSD unvis.c v1.44) |
## Description
The `unvis()` function in `src/unvis.c` contains a computation error in the `S_NUMBER` state machine handler, which is responsible for decoding HTML numeric character references (e.g., `A` → `'A'`).
**The buggy code (line 471):**
```c
case S_NUMBER:
if (uc == ';')
return UNVIS_VALID;
if (!isdigit(uc))
goto bad;
*cp += (*cp * 10) + uc - '0'; // BUG: += instead of =
return UNVIS_NOCHAR;
```
**The correct code should be:**
```c
*cp = (*cp * 10) + uc - '0'; // CORRECT: = instead of +=
```
The `+=` operator makes the formula equivalent to:
```
*cp = *cp + (*cp * 10) + digit = 11 * (*cp) + digit
```
Instead of the correct:
```
*cp = 10 * (*cp) + digit
```
This means **every multi-digit numeric HTML entity decodes to the wrong character value**.
## Proof of Exploitation
### Confirmed incorrect decodings:
| Input | Expected | Actual | Expected Char | Actual Char |
|-------|----------|--------|---------------|-------------|
| ` ` | 10 (LF) | 11 (VT) | newline | vertical tab |
| ` ` | 32 | 35 | space | `#` |
| `"` | 34 | 37 | `"` | `%` |
| `<` | 60 | 66 | `<` | `B` |
| `>` | 62 | 68 | `>` | `D` |
| `A` | 65 | 71 | `A` | `G` |
| `a` | 97 | 106 | `a` | `j` |
| `ÿ` | 255 | 46 | ÿ | `.` (overflow!) |
### Three-digit overflow:
For `ÿ`:
- Digit `2`: *cp = 0*11 + 2 = 2
- Digit `5`: *cp = 2*11 + 5 = 27
- Digit `5`: *cp = 27*11 + 5 = 302 → truncated to `char` → **46** (`'.'`)
The signed `char` overflow on line 471 is undefined behavior per the C standard.
## Security Impact
1. **Filter Bypass**: If `unvis`/`strunvis` is used to decode user input before security filtering (e.g., checking for `<`, `>`, `"`, `&`), the wrong decoded values will **bypass** the filter. For example, `<` should decode to `<` but instead decodes to `B`, so a `<` check would not trigger.
2. **Data Corruption**: Any application using VIS_HTTP1866 mode to decode data will get incorrect results for all multi-digit numeric entities.
3. **Undefined Behavior**: The `char` overflow for values ≥ 12 (at 3 digits, the result exceeds 127) is undefined behavior per C standard.
>How-To-Repeat:
## Reproduction
```bash
gcc -o poc poc.c -I/usr/local/include -L/usr/local/lib -lbsd
./poc
```
`poc`
```C
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <bsd/vis.h>
static int test_entity(const char *input, int expected_val, const char *desc) {
char output[256];
int state = 0;
char c;
int ret;
int failed = 0;
memset(output, 0, sizeof(output));
ret = strnunvisx(output, sizeof(output), input, VIS_HTTP1866);
if (ret < 0) {
printf(" [FAIL] %-20s -> strnunvisx() returned error\n", desc);
return 1;
}
int actual_val = (unsigned char)output[0];
if (actual_val != expected_val) {
printf(" [BUG] %-20s input=%-12s expected=%3d (0x%02x '%c') "
"actual=%3d (0x%02x",
desc, input, expected_val, expected_val,
(expected_val >= 32 && expected_val < 127) ? expected_val : '.',
actual_val, actual_val);
if (actual_val >= 32 && actual_val < 127)
printf(" '%c'", actual_val);
printf(")\n");
failed = 1;
} else {
printf(" [OK] %-20s input=%-12s value=%3d (0x%02x)\n",
desc, input, actual_val, actual_val);
}
return failed;
}
int main(void) {
int bugs = 0;
printf("=== Testing ===\n\n");
printf("--- Single-digit entities ---\n");
bugs += test_entity("�", 0, "NUL (�)");
bugs += test_entity("	", 9, "TAB (	)");
printf("\n--- Two-digit entities ---\n");
bugs += test_entity(" ", 10, "LF ( )");
bugs += test_entity(" ", 13, "CR ( )");
bugs += test_entity(" ", 32, "Space ( )");
bugs += test_entity(""", 34, "Quote (")");
bugs += test_entity("&", 38, "Amp (&)");
bugs += test_entity("<", 60, "LT (<)");
bugs += test_entity(">", 62, "GT (>)");
bugs += test_entity("A", 65, "A (A)");
bugs += test_entity("Z", 90, "Z (Z)");
bugs += test_entity("a", 97, "a (a)");
printf("\n--- Three-digit entities (possible char overflow) ---\n");
bugs += test_entity("d", 100, "d (d)");
bugs += test_entity("z", 122, "z (z)");
bugs += test_entity("", 127, "DEL ()");
bugs += test_entity(" ", 160, "NBSP ( )");
bugs += test_entity("ÿ", 255, "yuml (ÿ)");
printf("\n--- Summary ---\n");
if (bugs > 0) {
printf("[!] %d entities decoded to INCORRECT values.\n", bugs);
printf("[!] The *cp += formula (×11) diverges from correct *cp = formula (×10)\n");
} else {
printf("All entities decoded correctly (no bug found).\n");
}
return bugs > 0 ? 1 : 0;
}
```
>Fix:
## Recommended Fix
```diff
case S_NUMBER:
if (uc == ';')
return UNVIS_VALID;
if (!isdigit(uc))
goto bad;
- *cp += (*cp * 10) + uc - '0';
+ *cp = (*cp * 10) + uc - '0';
return UNVIS_NOCHAR;
```
Additionally, bounds checking should be added to prevent overflow for values > 255.
Home |
Main Index |
Thread Index |
Old Index