NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

standards/59606: od(1) prints wrong output with `-t c' and `-t a'



>Number:         59606
>Category:       standards
>Synopsis:       od(1) prints wrong output with `-t c' and `-t a'
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    standards-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 22 23:25:00 +0000 2025
>Originator:     Taylor R Campbell
>Release:        9, 10
>Organization:
The \012etBSD Foundation
>Environment:
>Description:

	POSIX sez:

	> The type specifier character c specifies that bytes shall
	> be interpreted as characters specified by the current
	> setting of the LC_CTYPE locale category.  Characters listed
	> in the table in XBD 5. File Format Notation
	> <https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/basedefs/V1_chap05.html#tag_05>
	> ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v') shall be
	> written as the corresponding escape sequences, except that
	> <backslash> shall be written as a single <backslash> and a
	> NUL shall be written as '\0'.

	However, `LC_ALL=C od -t c' prints `007' for `\a' and `013'
	for `\v'.

	Also, the `-t c' specifier is bizarrely specified to print
	only the first byte of multibyte sequences, and then replace
	the subsequent bytes by `**':

	> Printable multi-byte characters shall be written in the
	> area corresponding to the first byte of the character; the
	> two-character sequence "**" shall be written in the area
	> corresponding to each remaining byte in the character, as
	> an indication that the character is continued.

	But instead each byte is printed separately.

	Also, the `-t a' type specifier is bizarrely specified to
	use only the least significant _seven_ bits of each input
	byte, but od(1) currently heeds that byte:

	> The type specifier character a specifies that bytes shall
	> be interpreted as named characters from the International
	> Reference Version (IRV) of the ISO/IEC 646:1991
	> standard.  Only the least significant seven bits of each
	> byte shall be used for this type specification.

	(Whether we actually want to follow this bizarre
	specification is a separate question, but if not, the
	discrepancy should be documented.  In any case, the primary
	value of a _standard_ od(1) rather than the more flexible but
	less standard hexdump(1) is that the _standard_ od(1)
	produces the same output -- or almost the same output, except
	for \012 which can be `nl' or `lf' with `-t a' -- on every
	system.  Except it doesn't.)

	References:

	https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/utilities/od.html
>How-To-Repeat:

	For `-t c' with control characters:

	$ awk 'BEGIN { for (i = 0; i < 256; i++) printf "%c", i }' | od -t c
	0000000   \0 001 002 003 004 005 006 007  \b  \t  \n 013  \f  \r 016 017
	0000020  020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037
	0000040        !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
	0000060    0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
	0000100    @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
	0000120    P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
	0000140    `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
	0000160    p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ 177
	0000200  200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217
	0000220  220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237
	0000240  240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257
	0000260  260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277
	0000300  300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317
	0000320  320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337
	0000340  340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357
	0000360  360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377
	0000400

	Expected output:

	0000000   \0 001 002 003 004 005 006  \a  \b  \t  \n  \v  \f  \r 016 017
	0000020  020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037
	0000040        !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
	0000060    0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
	0000100    @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
	0000120    P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
	0000140    `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
	0000160    p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ 177
	0000200  200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217
	0000220  220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237
	0000240  240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257
	0000260  260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277
	0000300  300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317
	0000320  320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337
	0000340  340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357
	0000360  360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377
	0000400

	For `-t c' with multibyte sequences:

	$ printf '\317\200\n' | LC_CTYPE=C.UTF-8 od -t c
	0000000  317 200  \n
	0000003

	Expected output:

	0000000  317  **  \n
	0000003

	For `-t a':

	$ awk 'BEGIN { for (i = 0; i < 256; i++) printf "%c", i }' | od -t a
	0000000  nul soh stx etx eot enq ack bel  bs  ht  nl  vt  ff  cr  so  si
	0000020  dle dc1 dc2 dc3 dc4 nak syn etb can  em sub esc  fs  gs  rs  us
	0000040   sp   !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
	0000060    0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
	0000100    @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
	0000120    P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
	0000140    `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
	0000160    p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ del
	0000200   80  81  82  83  84  85  86  87  88  89  8a  8b  8c  8d  8e  8f
	0000220   90  91  92  93  94  95  96  97  98  99  9a  9b  9c  9d  9e  9f
	0000240   a0  a1  a2  a3  a4  a5  a6  a7  a8  a9  aa  ab  ac  ad  ae  af
	0000260   b0  b1  b2  b3  b4  b5  b6  b7  b8  b9  ba  bb  bc  bd  be  bf
	0000300   c0  c1  c2  c3  c4  c5  c6  c7  c8  c9  ca  cb  cc  cd  ce  cf
	0000320   d0  d1  d2  d3  d4  d5  d6  d7  d8  d9  da  db  dc  dd  de  df
	0000340   e0  e1  e2  e3  e4  e5  e6  e7  e8  e9  ea  eb  ec  ed  ee  ef
	0000360   f0  f1  f2  f3  f4  f5  f6  f7  f8  f9  fa  fb  fc  fd  fe  ff
	0000400

	Expected output:

	0000000  nul soh stx etx eot enq ack bel  bs  ht  nl  vt  ff  cr  so  si
	0000020  dle dc1 dc2 dc3 dc4 nak syn etb can  em sub esc  fs  gs  rs  us
	0000040   sp   !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
	0000060    0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
	0000100    @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
	0000120    P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
	0000140    `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
	0000160    p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ del
	0000200  nul soh stx etx eot enq ack bel  bs  ht  nl  vt  ff  cr  so  si
	0000220  dle dc1 dc2 dc3 dc4 nak syn etb can  em sub esc  fs  gs  rs  us
	0000240   sp   !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /
	0000260    0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?
	0000300    @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
	0000320    P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _
	0000340    `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o
	0000360    p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~ del
	0000400

	(Note: For bytes 0012 and and 0212, `lf' is allowed by POSIX
	too, so an automatic test should perhaps accept that option.)

>Fix:

	Yes, please!

	The \a vs 007 and \v vs 013 part is currently causing the
	postfix tests to fail on NetBSD.  (The multibyte sequence is
	also breaking things, but postfix tests expect nonstandard
	output too, and uses nonstandard options to boot.)



Home | Main Index | Thread Index | Old Index