Subject: 1.3.2 seg fault in shared lib?
To: None <current-users@netbsd.org>
From: Simon J. Gerraty <sjg@quick.com.au>
List: current-users
Date: 03/08/1999 23:35:16
This is a bit odd, I'd appreciate a sanity check.

A tool using shared libs, gets a seg fault on return from a
function, yet the same tool using static linking works fine.

I've done the obvious things like make clean and make of all the libs
involved (other than libc) to ensure that the static/shared libs
contain the same code and bumped my stack limit up, run ldconfig,
... 

Running under gdb and doing a stack trace just prior to return shows
no sign of stack corruption, but we get a seg fault at exactly the
same point every time...

If someone can think of another obvious avenue to check I'd appreciate
it.

In case it is of interest, ktrace output from the static tool shows:

 17561 noid     CALL  read(0x6,0x32008,0x2000)
 17561 noid     GIO   fd 6 read 8192 bytes
       "
        -- Extended Mosy format from
        --   SMIC version 1.0.9, July 23, 1992.
...
[hmm must fix that, its actually a much enhanced SMIC :-)]
...
        -- Extended Mosy OID tree
        -- From:   SNMPv2-SMI
        ccitt                0                regPt
        zeroDotZero          ccitt.0          regPt
        iso                  1                regPt
        org                  iso.3            regPt
        dod                  org.6            regPt
        internet             dod.1            regPt
        directory            internet.1       regPt
        mgmt                 internet.2       regPt
        mib-2                mgmt.1           regPt
        -- From:   SNMPv2-MIB
        system               mib-2.1          regPt
...
 17561 noid     RET   read 8192/0x2000
 17561 noid     CALL  break(0x9f800)
 17561 noid     RET   break 0

whereas in the dynamic version we get (for the last bit):

 16935 noid     RET   read 8192/0x2000
 16935 noid     PSIG  SIGSEGV SIG_DFL
 16935 noid     NAMI  "noid.core"

$ ldd obj/noid
obj/noid:
        -lsnmp2.0 => /usr/lib/libsnmp2.so.0.2 (0x4001a000)
        -lsjg.1 => /usr/lib/libsjg.so.1.2 (0x40030000)
        -ldmalloc.2 => /usr/lib/libdmalloc.so.2.0 (0x40044000)
        -lc.12 => /usr/lib/libc.so.12.20 (0x4004a000)
$ obj/noid
Memory fault (core dumped) 
: sjg:508; gdb obj/noid noid.core 
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (i386-netbsd), Copyright 1996 Free Software Foundation, Inc...
Core was generated by `noid'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/libexec/ld.so...done.
Reading symbols from /usr/lib/libsnmp2.so.0.2...done.
Reading symbols from /usr/lib/libsjg.so.1.2...done.
Reading symbols from /usr/lib/libdmalloc.so.2.0...done.
Reading symbols from /usr/lib/libc.so.12.20...done.
#0  0x40026f7b in yyparse ()
    at /u0/share/arch/NetBSD/i386/src/sjg/snmp/lib/snmp2/mosy.y:165
165                     free($1);
(gdb) l 160
155             ;
156
157     mib_objid
158             : STRING STRING '.' NUMBER NL {
159                     addNode($1,$2,$4,"OBJID","","");
160                     free($1);
161                     free($2);
162             }
163             | STRING STRING '.' NUMBER REGPT NL {
164                     addNode($1,$2,$4,"OBJID","","");

[ here is where we die]

165                     free($1);
166                     free($2);
167             }
168             | STRING NUMBER NL {
169                     addNode($1,"",$2,"OBJID","","");
170                     free($1);
171             }
172             | STRING NUMBER REGPT NL {
173                     MosyVersion = SMIC_EMOSY;
174                     addNode($1,"",$2,"OBJID","","");

setting a break-point in addNode shows it is on return from dealing
with:

        mgmt                 internet.2       regPt

that we die each time.  libdmalloc is a facist beast that picks up
things like free()ing non-allocated mem or already free()'d mem
(except on Solaris where that is expected :-), and also checks for
overruns etc.

By the time we die, the same bit of code has been exercised for
zeroDotZero, iso, org, dod, internet and directory before being called
for mgmt.  

A stack overrun is all I can think of, and I cannot find any evidence
for it.  Input welcome.

BTW, the actual C code from y.tab.c is:

		addNode(yyvsp[-5].s,yyvsp[-4].s,yyvsp[-2].i,"OBJID","","");
		free(yyvsp[-5].s);
		free(yyvsp[-4].s);

but even when no optimizer is used, gdb says:

Address of symbol "yyvsp" is unknown.

--sjg