Subject: Re: tales of woe from the recent tools/compat_defs.h pullups on netbsd-1-6
To: Greg A. Woods <woods@weird.com>
From: James Chacon <jmc@NetBSD.org>
List: tech-toolchain
Date: 06/21/2004 20:49:20
On Mon, Jun 21, 2004 at 06:34:17PM -0400, Greg A. Woods wrote:
> So, I just spent a good 6 hours or so trying to debug what turned out to
> be a missing prototype for strtouq() in the host tools environment.
Try pulling rev 1.40 of tools/compat/compat_defs.h into your tree.
I'm testing it on the 1.6 branch now and will get it pulled up provided nothing
breaks.
James
> I'll document my experience just in case it proves helpful to anyone
> facing similar problems someday.
>
> This all starts out because in my local source tree I have some changes
> in pw_scan() that allow me to have larger IDs.
>
> Sometime in the last week or so something changed such that now the
> tools version of nbpwd_mkdb fails when handling the following
> master.passwd entry, but oddly the new system pwd_mkdb, built in the
> same build environment, still works just fine.
>
> nfsanon:*:4294967294:4294967294::0:0:NFS anonymous user:/nonexistant:/sbin/nologin
>
> I.e. I now have UID_MAX and GID_MAX set as follows in <sys/syslimits.h>:
>
> #define GID_MAX (~((uid_t)0)) /* max value for a gid_t (2^32) */
> #define UID_MAX (~((uid_t)0)) /* max value for a uid_t (2^32) */
>
> It seemed the best way to implement this while retaining full error
> detection for bad values would be to use the next larger sized unsigned
> integer when parsing the input numbers (since strtoul() can't, or at
> least our version doesn't, do reliable overflow detection when the
> maximum value expected is the same as ULONG_MAX) and so my pw_scan() now
> does this:
>
> ...
> u_quad_t id;
> ...
> id = strtouq(p, &ep, 10);
> ...
> if (id > (u_quad_t) UID_MAX) {
> if (!(inflags & _PASSWORD_NOWARN))
> warnx("uid too big, '%s' > %llu", p, (u_quad_t) UID_MAX);
>
> The result for pwd_mkdb is normal but nbpwd_mkdb does as follows
>
> $ ./nbpwd_mkdb -p -L -d ~/tmp ~/tmp/etc/master.passwd
> pwd_mkdb: uid too big, '4294967294' > 4294967295
> pwd_mkdb: at line #53
> pwd_mkdb: /home/proven/woods/tmp/etc/master.passwd: Inappropriate file type or format
>
> Now the assertion in the first message looks a little confusing/wrong,
> but that's because the first number is actually just the string that was
> passed to strtouq(), but the second number is the conversion of UID_MAX.
>
> So, "What the heck happened?" I asked myself. This all worked yesterday
> on the Alpha, and last week on this same i386 machine!
>
> I haven't made any source changes in anything I would expect to be
> related -- I've only run "cvs update" from the netbsd-1-6 branch, and
> the only change I can see that's been made since the last successful
> build are the changes to tools/compat/compat_defs.h, and as it turns out
> this is indeed the cause.
>
> I walked through the execution of both nbpwd_mkdb and pwd_mkdb with gdb
> and was surprised to see that in the former (only) the return value of
> strtouq() changed magically from the expected '4294967294' just before
> the 'ret' to a wildly wrong value that was assigned to the "id" variable
> and this seems to be because the top four bytes of the value are wrong
>
> (gdb) print id
> $1 = 13816982054056755198
> (gdb) print /x id
> $2 = 0xbfbfc7f8fffffffe
>
> At first I didn't think this looked like a sign extension bug.
>
> Initially I had been trying to use '-march=pentiumpro' in both
> HOST_CFLAGS and CFLAGS and it took me some time to realize that wasn't
> the problem.
>
> Then I checked the post-processed code, which is identical except for
> the the "__nbcompat_" prefix on the function name, including the
> typedefs for "u_quad_t" which work back to "unsigned long long int" in
> both compilation envrionments as expected.
>
> The only obvious difference was of course is which compiler was used.
> However the weird thing is that the host compiler was built by a
> previous revision of the same source tree over a year ago and it
> obviously built a working pwd_mkdb back then, and it also worked for
> full i386-i386 and i386-sparc builds a week or so ago, so unless bit-rot
> had set in behind my back, nothing had changed recently in the host
> compiler. I.e. I would have been less surprised if the new "tools"
> compiler generated the buggy code, but that doesn't seem to be what
> happened because if I run the latest build of pwd_mkdb from
> /usr/src/usr.sbin/pwd_mkdb/obj.i386 it works just fine (as the
> disassembly above of the strtouq() call from pw_scan() would suggest it
> should).
>
> Now my i386 assembly language skills were never as good as my pdp-11 and
> 6502 and even vax assembly skills, but I soon discovered what seemed to
> be a bad botch in the code generated in the broken version of the call:
>
> 0x8049ac3 <__nbcompat_pw_scan+159>: add $0xfffffffc,%esp
> 0x8049ac6 <__nbcompat_pw_scan+162>: push $0xa
> 0x8049ac8 <__nbcompat_pw_scan+164>: lea 0xfffffffc(%ebp),%ebx
> 0x8049acb <__nbcompat_pw_scan+167>: push %ebx
> 0x8049acc <__nbcompat_pw_scan+168>: push %esi
> 0x8049acd <__nbcompat_pw_scan+169>: call 0x8053b3c <strtouq>
> 0x8049ad2 <__nbcompat_pw_scan+174>: mov %eax,%edx
> 0x8049ad4 <__nbcompat_pw_scan+176>: mov %eax,%ecx
> 0x8049ad6 <__nbcompat_pw_scan+178>: sar $0x1f,%ecx
>
> I.e. what the heck is that 'sar' instruction doing in there!?!?!? Ah
> ha! It _is_ a sign extension bug!
>
> The good version in the new libc.a is of course just:
>
> 0x80499a3 <pw_scan+159>: add $0xfffffffc,%esp
> 0x80499a6 <pw_scan+162>: push $0xa
> 0x80499a8 <pw_scan+164>: lea 0xfffffffc(%ebp),%ebx
> 0x80499ab <pw_scan+167>: push %ebx
> 0x80499ac <pw_scan+168>: push %esi
> 0x80499ad <pw_scan+169>: call 0x804a284 <_strtouq>
> 0x80499b2 <pw_scan+174>: mov %edx,%ecx
> 0x80499b4 <pw_scan+176>: mov %eax,%edx
>
> Finally with this knowledge I went back to the "cvs update" and using
> CHANGES-1.6.3 as a guide I soon zeroed in on the tools/compat_defs.h
> pullup and sure enough I soon realized that with that change the host
> tools environment suddenly had no prototype for strtouq(), etc. and thus
> the silent sign extension occurs.
>
> Don't you think it would be a good idea to add some compiler warnings to
> the host tools build (at least when the compiler is GCC)?
>
> -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Werror
>
> This would have saved me a lot of time and headache because it would
> have pointed out immediately (or at least after the tools/compat_defs.h
> pullup was merged into my tree), that I had added a call to a
> non-standard function in code that was used in the host tools
> environment.
>
> For my immediate purposes my fix is simply to add a prototype for
> strtouq() to pw_scan.c since I only do builds on NetBSD iteslf. I'll
> back out the compat_defs.h changes just to be sure too though.....
>
> I suppose that if I wanted to keep the host tools portable I would need
> to support an optional strtouq() function to the nbcompat library.....
>
> --
> Greg A. Woods
>
> +1 416 218-0098 VE3TCP RoboHack <woods@robohack.ca>
> Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com>
>
>