Re: bin/59657: syslogd outputs BOM in the message

To: gnats-bugs%netbsd.org@localhost

Subject: Re: bin/59657: syslogd outputs BOM in the message

From: Christos Zoulas <christos%zoulas.com@localhost>

Date: Fri, 19 Sep 2025 12:16:39 -0400

SYSLOG(3) and rfc5424 similarly state:

"If the msgfmt contains UTF-8 characters, then it has to start with
a Byte Order Mark."

The BOM is unexpected as a prefix for every message logged:

2025-09-17T01:59:10.205820+01:00 funcube potato - - - <feff>Árvíztűrő tükörfúrógép
How-To-Repeat:
syslogd -o rfc5424 -d

logger $(printf "\xEF\xBB\xBF%s" "Árvíztűrő tükörfúrógép")

tail -n 1 /var/log/messages | xxd
00000000: 3230 3235 2d30 392d 3137 5430 313a 3539 2025-09-17T01:59
00000010: 3a31 302e 3230 3538 3230 2b30 313a 3030 :10.205820+01:00
00000020: 2066 756e 6375 6265 2070 6f74 6174 6f20 funcube potato
00000030: 2d20 2d20 2d20 efbb bfc3 8172 76c3 ad7a - - - .....rv..z
00000040: 74c5 b172 c591 2074 c3bc 6bc3 b672 66c3 t..r.. t..k..rf.
00000050: ba72 c3b3 67c3 a970 0a .r..g..p.

Why do you say that? The BNF in the RFC says:

MSG = MSG-ANY / MSG-UTF8

MSG-ANY = *OCTET ; not starting with BOM

MSG-UTF8 = BOM UTF-8-STRING

BOM = %xEF.BB.BF

Now in practice according to ChatGPT:

Almost all modern syslog implementations do not emit a BOM, even for UTF-8 content.
Many receivers are tolerant and just assume UTF-8 without requiring BOM.
Some parsers can actually get confused if a BOM is present.

And:

RFC 5424 says the BOM is required if you send UTF-8 MSG.
In practice, it’s usually skipped, and interoperability tends to be better without it.

If your tool (msgfmt) prepends a BOM automatically, you should check the target syslog receiver. If it understands RFC 5424 to the letter, the BOM is technically correct. But if you’re aiming for compatibility with common syslog daemons (rsyslog, syslog-ng, journald forwarders), skipping the BOM is typically safer.

Perhaps adding a flag to select the behavior? What should the default be?

christos

Fix:
Index: ./usr.sbin/syslogd/syslogd.c
===================================================================
RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
retrieving revision 1.147
diff -u -r1.147 syslogd.c
--- ./usr.sbin/syslogd/syslogd.c        9 Nov 2024 16:31:31 -0000       1.147
+++ ./usr.sbin/syslogd/syslogd.c        17 Sep 2025 01:08:30 -0000
@@ -1243,6 +1243,7 @@
               DPRINTF(D_DATA, "UTF-8 BOM\n");
               utf8allowed = true;
               p += 3;
+               start += 3; /* skip BOM in output */
       }

       if (*p != '\0' && !utf8allowed) {

Attachment: signature.asc
Description: Message signed with OpenPGP