NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
standards/59606: od(1) prints wrong output with `-t c' and `-t a'
>Number: 59606
>Category: standards
>Synopsis: od(1) prints wrong output with `-t c' and `-t a'
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: standards-manager
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 22 23:25:00 +0000 2025
>Originator: Taylor R Campbell
>Release: 9, 10
>Organization:
The \012etBSD Foundation
>Environment:
>Description:
POSIX sez:
> The type specifier character c specifies that bytes shall
> be interpreted as characters specified by the current
> setting of the LC_CTYPE locale category. Characters listed
> in the table in XBD 5. File Format Notation
> <https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/basedefs/V1_chap05.html#tag_05>
> ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v') shall be
> written as the corresponding escape sequences, except that
> <backslash> shall be written as a single <backslash> and a
> NUL shall be written as '\0'.
However, `LC_ALL=C od -t c' prints `007' for `\a' and `013'
for `\v'.
Also, the `-t c' specifier is bizarrely specified to print
only the first byte of multibyte sequences, and then replace
the subsequent bytes by `**':
> Printable multi-byte characters shall be written in the
> area corresponding to the first byte of the character; the
> two-character sequence "**" shall be written in the area
> corresponding to each remaining byte in the character, as
> an indication that the character is continued.
But instead each byte is printed separately.
Also, the `-t a' type specifier is bizarrely specified to
use only the least significant _seven_ bits of each input
byte, but od(1) currently heeds that byte:
> The type specifier character a specifies that bytes shall
> be interpreted as named characters from the International
> Reference Version (IRV) of the ISO/IEC 646:1991
> standard. Only the least significant seven bits of each
> byte shall be used for this type specification.
(Whether we actually want to follow this bizarre
specification is a separate question, but if not, the
discrepancy should be documented. In any case, the primary
value of a _standard_ od(1) rather than the more flexible but
less standard hexdump(1) is that the _standard_ od(1)
produces the same output -- or almost the same output, except
for \012 which can be `nl' or `lf' with `-t a' -- on every
system. Except it doesn't.)
References:
https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/utilities/od.html
>How-To-Repeat:
For `-t c' with control characters:
$ awk 'BEGIN { for (i = 0; i < 256; i++) printf "%c", i }' | od -t c
0000000 \0 001 002 003 004 005 006 007 \b \t \n 013 \f \r 016 017
0000020 020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037
0000040 ! " # $ % & ' ( ) * + , - . /
0000060 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0000100 @ A B C D E F G H I J K L M N O
0000120 P Q R S T U V W X Y Z [ \ ] ^ _
0000140 ` a b c d e f g h i j k l m n o
0000160 p q r s t u v w x y z { | } ~ 177
0000200 200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217
0000220 220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237
0000240 240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257
0000260 260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277
0000300 300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317
0000320 320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337
0000340 340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357
0000360 360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377
0000400
Expected output:
0000000 \0 001 002 003 004 005 006 \a \b \t \n \v \f \r 016 017
0000020 020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037
0000040 ! " # $ % & ' ( ) * + , - . /
0000060 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0000100 @ A B C D E F G H I J K L M N O
0000120 P Q R S T U V W X Y Z [ \ ] ^ _
0000140 ` a b c d e f g h i j k l m n o
0000160 p q r s t u v w x y z { | } ~ 177
0000200 200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217
0000220 220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237
0000240 240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257
0000260 260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277
0000300 300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317
0000320 320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337
0000340 340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357
0000360 360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377
0000400
For `-t c' with multibyte sequences:
$ printf '\317\200\n' | LC_CTYPE=C.UTF-8 od -t c
0000000 317 200 \n
0000003
Expected output:
0000000 317 ** \n
0000003
For `-t a':
$ awk 'BEGIN { for (i = 0; i < 256; i++) printf "%c", i }' | od -t a
0000000 nul soh stx etx eot enq ack bel bs ht nl vt ff cr so si
0000020 dle dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us
0000040 sp ! " # $ % & ' ( ) * + , - . /
0000060 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0000100 @ A B C D E F G H I J K L M N O
0000120 P Q R S T U V W X Y Z [ \ ] ^ _
0000140 ` a b c d e f g h i j k l m n o
0000160 p q r s t u v w x y z { | } ~ del
0000200 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f
0000220 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f
0000240 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af
0000260 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
0000300 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
0000320 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df
0000340 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef
0000360 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff
0000400
Expected output:
0000000 nul soh stx etx eot enq ack bel bs ht nl vt ff cr so si
0000020 dle dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us
0000040 sp ! " # $ % & ' ( ) * + , - . /
0000060 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0000100 @ A B C D E F G H I J K L M N O
0000120 P Q R S T U V W X Y Z [ \ ] ^ _
0000140 ` a b c d e f g h i j k l m n o
0000160 p q r s t u v w x y z { | } ~ del
0000200 nul soh stx etx eot enq ack bel bs ht nl vt ff cr so si
0000220 dle dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us
0000240 sp ! " # $ % & ' ( ) * + , - . /
0000260 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0000300 @ A B C D E F G H I J K L M N O
0000320 P Q R S T U V W X Y Z [ \ ] ^ _
0000340 ` a b c d e f g h i j k l m n o
0000360 p q r s t u v w x y z { | } ~ del
0000400
(Note: For bytes 0012 and and 0212, `lf' is allowed by POSIX
too, so an automatic test should perhaps accept that option.)
>Fix:
Yes, please!
The \a vs 007 and \v vs 013 part is currently causing the
postfix tests to fail on NetBSD. (The multibyte sequence is
also breaking things, but postfix tests expect nonstandard
output too, and uses nonstandard options to boot.)
Home |
Main Index |
Thread Index |
Old Index