So, I can now report I've been a victim of my own aging eyes and clumsiness. :-) In summary the problem was due to accidentally typing an errant character in a source file while browsing it (sometime back in February), and worse yet I saved it without knowing I had done so, and further having the bad luck for that character to not trigger any errors during compilation. The errant character was a tilde ('~'), and it landed at the beginning of line #122 in src/lib/libc/gen/ctype_.c. Obviously, but unfortunately for me, this did not generate a syntax error, but instead just changed the value of one entry in the _ctype_tab_ table (the one for the space character). The long version of the story is that after all the previously mentioned problems with debuggers, etc., I started debugging by inserting some better error messages in usr.bin/hexdump/parse.c to see if I could discover exactly what line the problem was occurring on, and sure enough it seemed to be with the <ctype.h> macros. Then I found I was able to work around the problem by locally defining a naive version of isdigit() (probably (I have not verified) this worked because the new value for the space character in _ctype_tab_ was now identified as a digit, and my naive replacement avoided this problem). The final mystery is why the affected programs work when run with either a newer kernel, or on amd64. Although I can reproduce the bug in hexdump, I cannot seem to reproduce it exactly. I.e. if I reproduce the bug by locally defining _ctype_tab_ et al with the errant value then hexdump, when compiled for i386, exhibits the same problem on both i386 and amd64 with matching and newer kernels. I.e. the reproduced bug does not disappear in the scenarios where it disappeared before. The old buggy binary still only exhibits the bug only on a real i386 with a matching kernel, and of course it still works OK on both amd64 with a matching kernel and on a real i386 with a newer kernel. Keep in mind this is a static-linked binary. Here's the buggy version working fine on a real i386 with a newer kernel: $ uname -a NetBSD once.local 9.0 NetBSD 9.0 (GENERIC) #0: Fri Feb 14 00:06:28 UTC 2020 mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/i386/compile/GENERIC i386 $ /more/home/more/woods/tmp/hexdump- asdf 0000000 7361 6664 000a 0000005 $ file /more/home/more/woods/tmp/hexdump- /more/home/more/woods/tmp/hexdump-: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for NetBSD 8.99.32, stripped Here's the buggy version working fine on amd64 (with a matching kernel): $ uname -a NetBSD future 8.99.32 NetBSD 8.99.32 (XEN3_DOMU) #1: Thu Nov 28 18:31:36 PST 2019 woods@future:/build/woods/future/current-amd64-amd64-obj/more/work/woods/m-NetBSD-current/sys/arch/amd64/compile/XEN3_DOMU amd64 $ ~/tmp/hexdump- asdf 0000000 7361 6664 000a 0000005 Here's the buggy version failing on a real i386 with a matching kernel: $ uname -a NetBSD lilbit 8.99.32 NetBSD 8.99.32 (NET5501) #3: Fri May 1 16:55:04 PDT 2020 woods@once.local:/build/woods/once.local/current-i386-i386-ppro-obj/more/work/woods/m-NetBSD-current/sys/arch/i386/compile/NET5501 i386 $ /more/home/more/woods/tmp/hexdump- hexdump-: ""%07.7_ax " 8/2 "%04x " "\n"": bad format I guess the most interesting test would be to step instruction by instruction through the execution on the real i386 with a newer kernel and see if I can understand how it manages to work. I don't think I kept a copy of hexdump.debug though -- I may have to rebuild the whole tree with the original error to make that less arduous to do. Oh well, I guess it only takes about 4 hours on my speediest build machine. -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpGpjxf9FKjb.pgp
Description: OpenPGP Digital Signature