MSG = MSG-ANY / MSG-UTF8
MSG-ANY = *OCTET ; not starting with BOM
MSG-UTF8 = BOM UTF-8-STRING
BOM = %xEF.BB.BF
Now in practice according to ChatGPT:
Almost all modern syslog implementations do not emit a BOM, even for UTF-8 content.
Many receivers are tolerant and just assume UTF-8 without requiring BOM.
Some parsers can actually get confused if a BOM is present.
And:
RFC 5424 says the BOM is required if you send UTF-8 MSG.
In practice, it’s usually skipped, and interoperability tends to be better without it.
If your tool (msgfmt
) prepends a BOM automatically, you should check the target syslog receiver. If it understands RFC 5424 to the letter, the BOM is technically correct. But if you’re aiming for compatibility with common syslog daemons (rsyslog, syslog-ng, journald forwarders), skipping the BOM is typically safer.
Perhaps adding a flag to select the behavior? What should the default be?
christos
Fix:
Index: ./usr.sbin/syslogd/syslogd.c
===================================================================
RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
retrieving revision 1.147
diff -u -r1.147 syslogd.c
--- ./usr.sbin/syslogd/syslogd.c 9 Nov 2024 16:31:31 -0000 1.147
+++ ./usr.sbin/syslogd/syslogd.c 17 Sep 2025 01:08:30 -0000
@@ -1243,6 +1243,7 @@
DPRINTF(D_DATA, "UTF-8 BOM\n");
utf8allowed = true;
p += 3;
+ start += 3; /* skip BOM in output */
}
if (*p != '\0' && !utf8allowed) {