NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lib/60111: unvis() S_NUMBER Numeric HTML Entity Decoding Issue



>Number:         60111
>Category:       lib
>Synopsis:       unvis() S_NUMBER Numeric HTML Entity Decoding Issue
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Mar 20 10:10:00 +0000 2026
>Originator:     Yeo Jong Han
>Release:        libbsd v1.44
>Organization:
NIL
>Environment:
>Description:
| Field | Value |
|-------|-------|
| **Type** | Logic Error / Incorrect Computation / Integer Overflow |
| **Severity** | HIGH |
| **CVSS v3.1** | 7.5 (AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N) |
| **Affected File** | `src/unvis.c`, line 471 |
| **Affected Functions** | `unvis()`, `strunvis()`, `strunvisx()`, `strnunvisx()` |
| **Affected Versions** | All versions of libbsd (bug inherited from NetBSD unvis.c v1.44) |

## Description

The `unvis()` function in `src/unvis.c` contains a computation error in the `S_NUMBER` state machine handler, which is responsible for decoding HTML numeric character references (e.g., `A` → `'A'`).

**The buggy code (line 471):**
```c
case S_NUMBER:
    if (uc == ';')
        return UNVIS_VALID;
    if (!isdigit(uc))
        goto bad;
    *cp += (*cp * 10) + uc - '0';   // BUG: += instead of =
    return UNVIS_NOCHAR;
```

**The correct code should be:**
```c
    *cp = (*cp * 10) + uc - '0';    // CORRECT: = instead of +=
```

The `+=` operator makes the formula equivalent to:
```
*cp = *cp + (*cp * 10) + digit = 11 * (*cp) + digit
```
Instead of the correct:
```
*cp = 10 * (*cp) + digit
```

This means **every multi-digit numeric HTML entity decodes to the wrong character value**.

## Proof of Exploitation

### Confirmed incorrect decodings:

| Input | Expected | Actual | Expected Char | Actual Char |
|-------|----------|--------|---------------|-------------|
| `
` | 10 (LF) | 11 (VT) | newline | vertical tab |
| ` ` | 32 | 35 | space | `#` |
| `"` | 34 | 37 | `"` | `%` |
| `&#60;` | 60 | 66 | `<` | `B` |
| `&#62;` | 62 | 68 | `>` | `D` |
| `&#65;` | 65 | 71 | `A` | `G` |
| `&#97;` | 97 | 106 | `a` | `j` |
| `&#255;` | 255 | 46 | ÿ | `.` (overflow!) |

### Three-digit overflow:
For `&#255;`:
- Digit `2`: *cp = 0*11 + 2 = 2
- Digit `5`: *cp = 2*11 + 5 = 27
- Digit `5`: *cp = 27*11 + 5 = 302 &#8594; truncated to `char` &#8594; **46** (`'.'`)

The signed `char` overflow on line 471 is undefined behavior per the C standard.

## Security Impact

1. **Filter Bypass**: If `unvis`/`strunvis` is used to decode user input before security filtering (e.g., checking for `<`, `>`, `"`, `&`), the wrong decoded values will **bypass** the filter. For example, `&#60;` should decode to `<` but instead decodes to `B`, so a `<` check would not trigger.

2. **Data Corruption**: Any application using VIS_HTTP1866 mode to decode data will get incorrect results for all multi-digit numeric entities.

3. **Undefined Behavior**: The `char` overflow for values &#8805; 12 (at 3 digits, the result exceeds 127) is undefined behavior per C standard.

>How-To-Repeat:
## Reproduction

```bash
gcc -o poc poc.c -I/usr/local/include -L/usr/local/lib -lbsd
./poc
```

`poc`
```C
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <bsd/vis.h>

static int test_entity(const char *input, int expected_val, const char *desc) {
    char output[256];
    int state = 0;
    char c;
    int ret;
    int failed = 0;

    memset(output, 0, sizeof(output));

    ret = strnunvisx(output, sizeof(output), input, VIS_HTTP1866);
    if (ret < 0) {
        printf("  [FAIL] %-20s -> strnunvisx() returned error\n", desc);
        return 1;
    }

    int actual_val = (unsigned char)output[0];
    if (actual_val != expected_val) {
        printf("  [BUG]  %-20s  input=%-12s  expected=%3d (0x%02x '%c')  "
               "actual=%3d (0x%02x",
               desc, input, expected_val, expected_val,
               (expected_val >= 32 && expected_val < 127) ? expected_val : '.',
               actual_val, actual_val);
        if (actual_val >= 32 && actual_val < 127)
            printf(" '%c'", actual_val);
        printf(")\n");
        failed = 1;
    } else {
        printf("  [OK]   %-20s  input=%-12s  value=%3d (0x%02x)\n",
               desc, input, actual_val, actual_val);
    }
    return failed;
}

int main(void) {
    int bugs = 0;

    printf("=== Testing ===\n\n");

    printf("--- Single-digit entities ---\n");
    bugs += test_entity("&#0;",   0,   "NUL (&#0;)");
    bugs += test_entity("&#9;",   9,   "TAB (&#9;)");

    printf("\n--- Two-digit entities ---\n");
    bugs += test_entity("&#10;",  10,  "LF  (&#10;)");
    bugs += test_entity("&#13;",  13,  "CR  (&#13;)");
    bugs += test_entity("&#32;",  32,  "Space (&#32;)");
    bugs += test_entity("&#34;",  34,  "Quote (&#34;)");
    bugs += test_entity("&#38;",  38,  "Amp (&#38;)");
    bugs += test_entity("&#60;",  60,  "LT  (&#60;)");
    bugs += test_entity("&#62;",  62,  "GT  (&#62;)");
    bugs += test_entity("&#65;",  65,  "A   (&#65;)");
    bugs += test_entity("&#90;",  90,  "Z   (&#90;)");
    bugs += test_entity("&#97;",  97,  "a   (&#97;)");

    printf("\n--- Three-digit entities (possible char overflow) ---\n");
    bugs += test_entity("&#100;", 100, "d   (&#100;)");
    bugs += test_entity("&#122;", 122, "z   (&#122;)");
    bugs += test_entity("&#127;", 127, "DEL (&#127;)");
    bugs += test_entity("&#160;", 160, "NBSP (&#160;)");
    bugs += test_entity("&#255;", 255, "yuml (&#255;)");

    printf("\n--- Summary ---\n");
    if (bugs > 0) {
        printf("[!] %d entities decoded to INCORRECT values.\n", bugs);
        printf("[!] The *cp += formula (×11) diverges from correct *cp = formula (×10)\n");
    } else {
        printf("All entities decoded correctly (no bug found).\n");
    }

    return bugs > 0 ? 1 : 0;
}
```
>Fix:
## Recommended Fix

```diff
 case S_NUMBER:
     if (uc == ';')
         return UNVIS_VALID;
     if (!isdigit(uc))
         goto bad;
-    *cp += (*cp * 10) + uc - '0';
+    *cp = (*cp * 10) + uc - '0';
     return UNVIS_NOCHAR;
```

Additionally, bounds checking should be added to prevent overflow for values > 255.




Home | Main Index | Thread Index | Old Index