tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Odd pigz behavior on build cluster - libpthread bug?



On Sat, Jun 14, 2014 at 09:31:18AM -0400, Thor Lancelot Simon wrote:
> 26269      1 nbpigz   CALL  ___lwp_park50(0,0,0x7f7ff6800030,0x7f7ff6800030)
> 26269      3 nbpigz   CALL  _lwp_ctl(1,0x7f7ff6400208)
> 26269      3 nbpigz   RET   _lwp_ctl 0
> 26269      3 nbpigz   CALL  mmap(0,0x1000,3,0x1002,0xffffffff,0,0)
> [mmap more stuff]
> 26269      3 nbpigz   CALL  _lwp_unpark(2,0x7f7ff7b030f8)
>  26269      3 nbpigz   RET   _lwp_unpark 0
>  26269      2 nbpigz   CSW   stop kernel
>  26269      2 nbpigz   CSW   resume kernel
>  26269      2 nbpigz   RET   ___lwp_park50 0
>  26269      2 nbpigz   CALL  write(1,0x7f7ff7747000,0x1127)
>  26269      3 nbpigz   CALL  ___lwp_park50(0,0,0x7f7ff7b03098,0x7f7ff7b03098)
>  26269      2 nbpigz   GIO   fd 1 wrote 4088 bytes
> [after the write, there is a short burst of repeated calls to lwp_self
>  and sigprocmask, then the cycle seems to resume]

Correction - the cycle did *not* resume.  The process did one more -- very
short -- write to stdout, then exited.

Perhaps one thread is spinning in some way waiting for the others to finish
work, and attacking it with ktrace -t A resolved a race of some kind.

The other pigz process in question, which I didn't ktrace, was still
"happily" spinning along.  Perhaps not surprisingly, attaching GDB to
try to get a backtrace allowed it to exit as well.  "Yay".

The backtrace isn't terribly useful:

(gdb) attach 21425
Attaching to process 21425
Reading symbols from
/home/builds/ab/HEAD/amd64/201406141020Z-tools/bin/nbpigz...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libpthread.so.1...(no debugging symbols
found)...done.
[New LWP 2]
Loaded symbols for /usr/lib/libpthread.so.1
Reading symbols from /usr/lib/libc.so.12...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libc.so.12
Reading symbols from /usr/lib/libgcc_s.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libgcc_s.so.1
Reading symbols from /usr/libexec/ld.elf_so...(no debugging symbols
found)...done.
Loaded symbols for /usr/libexec/ld.elf_so
[Switching to LWP 1]
0x00007f7ff7c0d818 in strcmp () from /usr/libexec/ld.elf_so
(gdb) where
#0  0x00007f7ff7c0d818 in strcmp () from /usr/libexec/ld.elf_so
#1  0x00007f7ff7c073cf in _rtld_symlook_obj () from /usr/libexec/ld.elf_so
#2  0x00007f7ff7c079a9 in _rtld_symlook_default.clone.3 ()
   from /usr/libexec/ld.elf_so
#3  0x00007f7ff7c07f23 in _rtld_find_plt_symdef () from /usr/libexec/ld.elf_so
#4  0x00007f7ff7c03fbd in _rtld_bind () from /usr/libexec/ld.elf_so
#5  0x00007f7ff7c03c1d in _rtld_bind_start () from /usr/libexec/ld.elf_so
#6  0x0000000000000246 in ?? ()
#7  0x00007f7ff707624a in getcontext () from /usr/lib/libc.so.12
#8  0x00000000003fe000 in ?? ()
#9  0x00007f7ff6802000 in ?? ()
#10 0x0000000000408e92 in twist ()
#11 0x0000000000000000 in ?? ()
(gdb) detach

Thor


Home | Main Index | Thread Index | Old Index