Subject: Re: pthreads and SIGILL on m68k
To: None <port-m68k@NetBSD.org>
From: Aymeric Vincent <vincent@labri.fr>
List: port-m68k
Date: 11/17/2006 14:28:29
Hi,

yes, I do have the same kind of problem on Amiga. The simplest way to
trigger it for me is to have one thread write() "a"'s while another
write()s "b"'s. Using rlogin to access the machine while the program
runs is enough to make the program dump core, seemingly the next time
a thread switch would have occurred. The machine has very little
memory (8MB).

I couldn't track it down although I spent days reading code and memory
dumps.

BTW, I saw that there are movel %sp,%sp@- in a few places in
libpthread. Could the behaviour of this instruction be
non-deterministic? I couldn't see anything in the m68k manuals which
would dictate the precise behaviour of this instruction.

Regards,
 Aymeric

Paul Ripke <stix@stix.id.au> writes:

> I've been getting regular illegal instruction coredumps on mac68k
> with pthread programs (including named). Looking at the coredumps in
> gdb, it seems like something is taking leap to a maybe-random address.
> Does anyone else see this on other m68k?
>
> BTW: All this is on a release build from CVS tag netbsd-4, running
> on a Apple Macintosh Quadra 605 (with full 68040).
>
> Here's a dump from a program with only two threads:
>
> ksh$ gdb `whence fblckgen` fblckgen.core
> GNU gdb 5.3nb1
> Copyright 2002 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "m68k--netbsdelf"...
> Core was generated by `fblckgen'.
> Program terminated with signal 4, Illegal instruction.
> Reading symbols from /usr/lib/libpthread.so.0...done.
> Loaded symbols for /usr/lib/libpthread.so.0
> Reading symbols from /usr/lib/libc.so.12...done.
> Loaded symbols for /usr/lib/libc.so.12
> Reading symbols from /usr/libexec/ld.elf_so...done.
> Loaded symbols for /usr/libexec/ld.elf_so
> #0  0x049ffbe4 in ?? ()
> (gdb) thr app all bt
>
> Thread 3 (Thread 22 ()):
> #0  0x04023174 in pthread__locked_switch () from /usr/lib/libpthread.so.0
> #1  0x06bffb70 in ?? ()
> #2  0x040283b2 in pthread_cond_wait () from /usr/lib/libpthread.so.0
> #3  0x0000353e in makeBlocks (dummy=0x0) at fblckgen.c:234
> #4  0x040296ec in pthread_create () from /usr/lib/libpthread.so.0
>
> Thread 2 (LWP 2):
> #0  0x040584c2 in write () from /usr/lib/libc.so.12
> #1  0x04022fca in write () from /usr/lib/libpthread.so.0
> #2  0x000031be in main (argc=131072, argv=0x0) at fblckgen.c:179
>
> Thread 1 (LWP 1):
> #0  0x049ffbe4 in ?? ()
> #1  0x040283b2 in pthread_cond_wait () from /usr/lib/libpthread.so.0
> #2  0x0000353e in makeBlocks (dummy=0x0) at fblckgen.c:234
> #3  0x040296ec in pthread_create () from /usr/lib/libpthread.so.0
> #0  0x049ffbe4 in ?? ()
>
> Another program, with 3 threads:
>
> ksh$ gdb iohammer iohammer.core.4
> ...
> Core was generated by `iohammer'.
> Program terminated with signal 4, Illegal instruction.
> ...
> #0  0x043ffbe4 in ?? ()
> (gdb) thr app all bt
>
> Thread 4 (Thread 1 ()):
> #0  0x04025174 in pthread__locked_switch () from /usr/lib/libpthread.so.0
> #1  0xffffade0 in ?? ()
> #2  0x0402a3b2 in pthread_cond_wait () from /usr/lib/libpthread.so.0
> #3  0x0000367e in main (argc=0, argv=0x0) at iohammer.c:185
>
> Thread 3 (LWP 1):
> #0  0x0405a53e in read () from /usr/lib/libc.so.12
> #1  0x04024e4e in read () from /usr/lib/libpthread.so.0
> #2  0x00003bf0 in doIO (arg=0x0) at iohammer.c:351
> #3  0x0402b6ec in pthread_create () from /usr/lib/libpthread.so.0
>
> Thread 2 (LWP 2):
> #0  0x0405a53e in read () from /usr/lib/libc.so.12
> #1  0x04024e4e in read () from /usr/lib/libpthread.so.0
> #2  0x00003bf0 in doIO (arg=0x1) at iohammer.c:351
> #3  0x0402b6ec in pthread_create () from /usr/lib/libpthread.so.0
>
> Thread 1 (LWP 3):
> #0  0x043ffbe4 in ?? ()
> #0  0x043ffbe4 in ?? ()
>
> And finally, named:
>
> ksh$ sudo gdb /usr/sbin/named /etc/namedb/named.core
> ...
> Core was generated by `named'.
> Program terminated with signal 4, Illegal instruction.
> Reading symbols from /usr/lib/libpthread.so.0...(no debugging symbols found)...
> done.
> Loaded symbols for /usr/lib/libpthread.so.0
> Reading symbols from /usr/lib/libcrypto.so.3...(no debugging symbols found)...
> done.
> Loaded symbols for /usr/lib/libcrypto.so.3
> Reading symbols from /usr/lib/libc.so.12...(no debugging symbols found)...done.
> Loaded symbols for /usr/lib/libc.so.12
> Reading symbols from /lib/libcrypt.so.0...(no debugging symbols found)...done.
> Loaded symbols for /lib/libcrypt.so.0
> Reading symbols from /usr/libexec/ld.elf_so...(no debugging symbols found)...
> done.
> Loaded symbols for /usr/libexec/ld.elf_so
> #0  0x049ff808 in ?? ()
> (gdb) thr app all bt
>
> Thread 5 (Thread 23 ()):
> #0  0x04147174 in pthread__locked_switch () from /usr/lib/libpthread.so.0
> #1  0x06fffb68 in ?? ()
> #2  0x0414c3b2 in pthread_cond_wait () from /usr/lib/libpthread.so.0
> #3  0x000f10bc in isc_timer_detach ()
> #4  0x0414d6ec in pthread_create () from /usr/lib/libpthread.so.0
>
> Thread 4 (LWP 2):
> #0  0x042ac51e in select () from /usr/lib/libc.so.12
> #1  0x04146f0e in select () from /usr/lib/libpthread.so.0
> #2  0x000e537e in isc_socket_detach ()
> #3  0x0414d6ec in pthread_create () from /usr/lib/libpthread.so.0
>
> Thread 3 (LWP 4):
> #0  0x042d8c12 in __sigtimedwait () from /usr/lib/libc.so.12
> #1  0x04148196 in sigtimedwait () from /usr/lib/libpthread.so.0
> #2  0x042ad2ec in sigwait () from /usr/lib/libc.so.12
> #3  0x000e8fc4 in isc_app_run ()
> #4  0x0001596c in main ()
> #5  0x000058e4 in __start ()
>
> Thread 2 (LWP 3):
> #0  0x00015ffe in ns_query_init ()
> #1  0x00008da0 in client_create ()
> #2  0x00009c00 in ns_clientmgr_createclients ()
> #3  0x0000de14 in ns_interface_listenudp ()
> #4  0x0000e062 in ns_interface_setup ()
> #5  0x0000e974 in do_scan ()
> #6  0x0000ed6a in ns_interfacemgr_scan0 ()
> #7  0x0000ede0 in ns_interfacemgr_scan ()
> #8  0x0001ebf8 in scan_interfaces ()
> #9  0x0001f6a0 in load_configuration ()
> #10 0x000208d6 in run_server ()
> #11 0x000fccd6 in isc_task_getcurrenttime ()
> #12 0x000fcdf6 in isc_task_getcurrenttime ()
> #13 0x0414d6ec in pthread_create () from /usr/lib/libpthread.so.0
>
> Thread 1 (LWP 5):
> #0  0x049ff808 in ?? ()
> #0  0x049ff808 in ?? ()
>
> Looking now, the PCs of the failing threads are very similar. I've
> used PTHREAD_DEBUGLOG, but can't see anything of great meaning
> there - anyone have any ideas, before I go digging deeper?
>
> -- 
> Paul Ripke