pkgsrc-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to crash
The following reply was made to PR pkg/50939; it has been noted by GNATS.
From: Gavan Fantom <gavan%coolfactor.org@localhost>
To: gnats-bugs%netbsd.org@localhost, tech-toolchain%netbsd.org@localhost
Cc: dholland-pbugs%netbsd.org@localhost, maya%NetBSD.org@localhost, kivinen%iki.fi@localhost,
adam%netbsd.org@localhost
Subject: Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to
crash
Date: Fri, 6 Oct 2017 02:14:39 +0100
Some time ago, David Holland wrote:
> This sounds like it is overwriting its stack, probably in the mem_mib
> call. Then when it returns form the mem_mib call it manages to go to
> the wrong place. Can you check in the debugger if this is the case?
>
> What gets trashed if you overwrite the stack can depend heavily on
> compiler optimizations, so it's not necessarily a gcc bug.
>
> I don't see anything obviously wrong with the code, but that isn't
> conclusive.
>
> Also, is this happening on real i386, or in a 32-bit chroot on an
> amd64? Might also be a problem with the compat32 sysctl().
I have reproduced this on NetBSD 7.1 on a real i386 machine.
The problem appears to be a compiler bug. Consider the following code,
from the middle of netsnmp_cpu_arch_load:
       for (i = 0; i < cpu_num; i++) {
           netsnmp_cpu_info *ncpu = netsnmp_cpu_get_byIdx( i, 1 );
           size_t j = i * CPUSTATES;
           ncpu->user_ticks = (unsigned long long)ncpu_stats[j + CP_USER];
           ncpu->nice_ticks = (unsigned long long)ncpu_stats[j + CP_NICE];
           ncpu->sys2_ticks = (unsigned long long)ncpu_stats[j +
CP_SYS]+cpu_stats[j + CP_INTR];
           ncpu->kern_ticks = (unsigned long long)ncpu_stats[j + CP_SYS];
           ncpu->idle_ticks = (unsigned long long)ncpu_stats[j + CP_IDLE];
           ncpu->intrpt_ticks = (unsigned long long)ncpu_stats[j +
CP_INTR];
       }
This is translated into the following block of code (disassembled by
gdb). The block is entered via a conditional branch from elsewhere, if
cpu_num > 0.
  0xbba64c88 <+1039>: movl  $0x1,0x4(%esp)
  0xbba64c90 <+1047>: movl  $0x0,(%esp)
  0xbba64c97 <+1054>: call  0xbba09460 <netsnmp_cpu_get_byIdx@plt>
  0xbba64c9c <+1059>: mov   (%edi),%edx
  0xbba64c9e <+1061>: mov   0x4(%edi),%ecx
  0xbba64ca1 <+1064>: mov   %edx,0x2008(%eax)
  0xbba64ca7 <+1070>: mov   %ecx,0x200c(%eax)
  0xbba64cad <+1076>: mov   0x8(%edi),%edx
  0xbba64cb0 <+1079>: mov   0xc(%edi),%ecx
  0xbba64cb3 <+1082>: mov   %edx,0x2010(%eax)
  0xbba64cb9 <+1088>: mov   %ecx,0x2014(%eax)
  0xbba64cbf <+1094>: mov   0x10(%edi),%edx
  0xbba64cc2 <+1097>: mov   0x14(%edi),%ecx
  0xbba64cc5 <+1100>: add   0x54(%esp),%edx
  0xbba64cc9 <+1104>: adc   0x58(%esp),%ecx
  0xbba64ccd <+1108>: mov   %edx,0x2068(%eax)
  0xbba64cd3 <+1114>: mov   %ecx,0x206c(%eax)
  0xbba64cd9 <+1120>: mov   0x10(%edi),%edx
  0xbba64cdc <+1123>: mov   0x14(%edi),%ecx
  0xbba64cdf <+1126>: mov   %edx,0x2030(%eax)
  0xbba64ce5 <+1132>: mov   %ecx,0x2034(%eax)
  0xbba64ceb <+1138>: mov   0x20(%edi),%edx
  0xbba64cee <+1141>: mov   0x24(%edi),%ecx
  0xbba64cf1 <+1144>: mov   %edx,0x2020(%eax)
  0xbba64cf7 <+1150>: mov   %ecx,0x2024(%eax)
  0xbba64cfd <+1156>: mov   0x18(%edi),%edx
  0xbba64d00 <+1159>: mov   0x1c(%edi),%ecx
  0xbba64d03 <+1162>: mov   %edx,0x2038(%eax)
  0xbba64d09 <+1168>: mov   %ecx,0x203c(%eax)
  0xbba64d0f <+1174>: mov   -0x258(%ebx),%eax
  0xbba64d15 <+1180>: mov   (%eax),%eax
  0xbba64d17 <+1182>: cmp   $0x1,%eax
  0xbba64d1a <+1185>: jle   0xbba64ace <netsnmp_cpu_arch_load+597>
  0xbba64d20 <+1191>: movl  $0x1,0x4(%esp)
  0xbba64d28 <+1199>: movl  $0x1,(%esp)
  0xbba64d2f <+1206>: call  0xbba09460 <netsnmp_cpu_get_byIdx@plt>
  0xbba64d34 <+1211>: mov   0x28(%edi),%edx
  0xbba64d37 <+1214>: mov   0x2c(%edi),%ecx
  0xbba64d3a <+1217>: mov   %edx,0x2008(%eax)
  0xbba64d40 <+1223>: mov   %ecx,0x200c(%eax)
  0xbba64d46 <+1229>: mov   0x30(%edi),%esi
  0xbba64d49 <+1232>: mov   0x34(%edi),%edi
  0xbba64d4c <+1235>: mov   %esi,0x2010(%eax)
  0xbba64d52 <+1241>: mov   %edi,0x2014(%eax)
The branch to 0xbba64ace is a branch back to continue the normal
execution of the code, where free(...) is called and life carries on.
Note that the compiler appears to have partially unrolled the loop. But
this is the end of that block of code. The next block of code happens to
be the cleanup code sysctl(mem_mib, ...) failing, which logs "sysctl
vm.vm_meter failed". This appears to be purely coincidental, and the
real failure here is that execution just falls off the end of this
half-finished loop unrolling.
  0xbba64d58 <+1247>: call  0xbba0abf0 <__errno@plt>
  0xbba64d5d <+1252>: mov   (%eax),%eax
  0xbba64d5f <+1254>: mov   %eax,0x8(%esp)
  0xbba64d63 <+1258>: lea   -0x41e78(%ebx),%eax
  0xbba64d69 <+1264>: mov   %eax,0x4(%esp)
  0xbba64d6d <+1268>: movl  $0x3,(%esp)
  0xbba64d74 <+1275>: call  0xbba0af70 <snmp_log@plt>
  0xbba64d79 <+1280>: jmp   0xbba649cd <netsnmp_cpu_arch_load+340>
It does look like a machine with only one CPU would be spared this fate
as it would exit the loop after the first iteration and not try to
execute the second, incomplete, iteration. This problem should be
reproducible on any NetBSD/i386 machine with at least 2 CPUs.
Obviously in the short term, the package will need to work around this
by disabling optimisation, but this is clearly something the compiler is
getting wrong.
Home |
Main Index |
Thread Index |
Old Index