pkgsrc-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to crash



The following reply was made to PR pkg/50939; it has been noted by GNATS.

From: Gavan Fantom <gavan%coolfactor.org@localhost>
To: gnats-bugs%netbsd.org@localhost, tech-toolchain%netbsd.org@localhost
Cc: dholland-pbugs%netbsd.org@localhost, maya%NetBSD.org@localhost, kivinen%iki.fi@localhost,
 adam%netbsd.org@localhost
Subject: Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to
 crash
Date: Fri, 6 Oct 2017 02:14:39 +0100

 Some time ago, David Holland wrote:
 >   This sounds like it is overwriting its stack, probably in the mem_mib
 >   call. Then when it returns form the mem_mib call it manages to go to
 >   the wrong place. Can you check in the debugger if this is the case?
 >
 >   What gets trashed if you overwrite the stack can depend heavily on
 >   compiler optimizations, so it's not necessarily a gcc bug.
 >
 >   I don't see anything obviously wrong with the code, but that isn't
 >   conclusive.
 >
 >   Also, is this happening on real i386, or in a 32-bit chroot on an
 >   amd64? Might also be a problem with the compat32 sysctl().
 
 I have reproduced this on NetBSD 7.1 on a real i386 machine.
 
 The problem appears to be a compiler bug. Consider the following code, 
 from the middle of netsnmp_cpu_arch_load:
 
          for (i = 0; i < cpu_num; i++) {
              netsnmp_cpu_info  *ncpu = netsnmp_cpu_get_byIdx( i, 1 );
              size_t j = i * CPUSTATES;
              ncpu->user_ticks = (unsigned long long)ncpu_stats[j + CP_USER];
              ncpu->nice_ticks = (unsigned long long)ncpu_stats[j + CP_NICE];
              ncpu->sys2_ticks = (unsigned long long)ncpu_stats[j + 
 CP_SYS]+cpu_stats[j + CP_INTR];
              ncpu->kern_ticks = (unsigned long long)ncpu_stats[j + CP_SYS];
              ncpu->idle_ticks = (unsigned long long)ncpu_stats[j + CP_IDLE];
              ncpu->intrpt_ticks = (unsigned long long)ncpu_stats[j + 
 CP_INTR];
          }
 
 This is translated into the following block of code (disassembled by 
 gdb). The block is entered via a conditional branch from elsewhere, if 
 cpu_num > 0.
 
     0xbba64c88 <+1039>:  movl   $0x1,0x4(%esp)
     0xbba64c90 <+1047>:  movl   $0x0,(%esp)
     0xbba64c97 <+1054>:  call   0xbba09460 <netsnmp_cpu_get_byIdx@plt>
     0xbba64c9c <+1059>:  mov    (%edi),%edx
     0xbba64c9e <+1061>:  mov    0x4(%edi),%ecx
     0xbba64ca1 <+1064>:  mov    %edx,0x2008(%eax)
     0xbba64ca7 <+1070>:  mov    %ecx,0x200c(%eax)
     0xbba64cad <+1076>:  mov    0x8(%edi),%edx
     0xbba64cb0 <+1079>:  mov    0xc(%edi),%ecx
     0xbba64cb3 <+1082>:  mov    %edx,0x2010(%eax)
     0xbba64cb9 <+1088>:  mov    %ecx,0x2014(%eax)
     0xbba64cbf <+1094>:  mov    0x10(%edi),%edx
     0xbba64cc2 <+1097>:  mov    0x14(%edi),%ecx
     0xbba64cc5 <+1100>:  add    0x54(%esp),%edx
     0xbba64cc9 <+1104>:  adc    0x58(%esp),%ecx
     0xbba64ccd <+1108>:  mov    %edx,0x2068(%eax)
     0xbba64cd3 <+1114>:  mov    %ecx,0x206c(%eax)
     0xbba64cd9 <+1120>:  mov    0x10(%edi),%edx
     0xbba64cdc <+1123>:  mov    0x14(%edi),%ecx
     0xbba64cdf <+1126>:  mov    %edx,0x2030(%eax)
     0xbba64ce5 <+1132>:  mov    %ecx,0x2034(%eax)
     0xbba64ceb <+1138>:  mov    0x20(%edi),%edx
     0xbba64cee <+1141>:  mov    0x24(%edi),%ecx
     0xbba64cf1 <+1144>:  mov    %edx,0x2020(%eax)
     0xbba64cf7 <+1150>:  mov    %ecx,0x2024(%eax)
     0xbba64cfd <+1156>:  mov    0x18(%edi),%edx
     0xbba64d00 <+1159>:  mov    0x1c(%edi),%ecx
     0xbba64d03 <+1162>:  mov    %edx,0x2038(%eax)
     0xbba64d09 <+1168>:  mov    %ecx,0x203c(%eax)
     0xbba64d0f <+1174>:  mov    -0x258(%ebx),%eax
     0xbba64d15 <+1180>:  mov    (%eax),%eax
     0xbba64d17 <+1182>:  cmp    $0x1,%eax
     0xbba64d1a <+1185>:  jle    0xbba64ace <netsnmp_cpu_arch_load+597>
     0xbba64d20 <+1191>:  movl   $0x1,0x4(%esp)
     0xbba64d28 <+1199>:  movl   $0x1,(%esp)
     0xbba64d2f <+1206>:  call   0xbba09460 <netsnmp_cpu_get_byIdx@plt>
     0xbba64d34 <+1211>:  mov    0x28(%edi),%edx
     0xbba64d37 <+1214>:  mov    0x2c(%edi),%ecx
     0xbba64d3a <+1217>:  mov    %edx,0x2008(%eax)
     0xbba64d40 <+1223>:  mov    %ecx,0x200c(%eax)
     0xbba64d46 <+1229>:  mov    0x30(%edi),%esi
     0xbba64d49 <+1232>:  mov    0x34(%edi),%edi
     0xbba64d4c <+1235>:  mov    %esi,0x2010(%eax)
     0xbba64d52 <+1241>:  mov    %edi,0x2014(%eax)
 
 The branch to 0xbba64ace is a branch back to continue the normal 
 execution of the code, where free(...) is called and life carries on.
 
 Note that the compiler appears to have partially unrolled the loop. But 
 this is the end of that block of code. The next block of code happens to 
 be the cleanup code sysctl(mem_mib, ...) failing, which logs "sysctl 
 vm.vm_meter failed". This appears to be purely coincidental, and the 
 real failure here is that execution just falls off the end of this 
 half-finished loop unrolling.
 
     0xbba64d58 <+1247>:  call   0xbba0abf0 <__errno@plt>
     0xbba64d5d <+1252>:  mov    (%eax),%eax
     0xbba64d5f <+1254>:  mov    %eax,0x8(%esp)
     0xbba64d63 <+1258>:  lea    -0x41e78(%ebx),%eax
     0xbba64d69 <+1264>:  mov    %eax,0x4(%esp)
     0xbba64d6d <+1268>:  movl   $0x3,(%esp)
     0xbba64d74 <+1275>:  call   0xbba0af70 <snmp_log@plt>
     0xbba64d79 <+1280>:  jmp    0xbba649cd <netsnmp_cpu_arch_load+340>
 
 It does look like a machine with only one CPU would be spared this fate 
 as it would exit the loop after the first iteration and not try to 
 execute the second, incomplete, iteration. This problem should be 
 reproducible on any NetBSD/i386 machine with at least 2 CPUs.
 
 Obviously in the short term, the package will need to work around this 
 by disabling optimisation, but this is clearly something the compiler is 
 getting wrong.
 


Home | Main Index | Thread Index | Old Index