Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to crash

To: adam%NetBSD.org@localhost, gnats-admin%netbsd.org@localhost, pkgsrc-bugs%netbsd.org@localhost, kivinen%iki.fi@localhost
Subject: Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to crash
From: Gavan Fantom <gavan%coolfactor.org@localhost>
Date: Fri, 6 Oct 2017 04:45:00 +0000 (UTC)

The following reply was made to PR pkg/50939; it has been noted by GNATS.

From: Gavan Fantom <gavan%coolfactor.org@localhost>
To: gnats-bugs%netbsd.org@localhost, tech-toolchain%netbsd.org@localhost
Cc: dholland-pbugs%netbsd.org@localhost, maya%NetBSD.org@localhost, kivinen%iki.fi@localhost,
 adam%netbsd.org@localhost
Subject: Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to
 crash
Date: Fri, 6 Oct 2017 02:14:39 +0100

 Some time ago, David Holland wrote:
 >   This sounds like it is overwriting its stack, probably in the mem_mib
 >   call. Then when it returns form the mem_mib call it manages to go to
 >   the wrong place. Can you check in the debugger if this is the case?
 >
 >   What gets trashed if you overwrite the stack can depend heavily on
 >   compiler optimizations, so it's not necessarily a gcc bug.
 >
 >   I don't see anything obviously wrong with the code, but that isn't
 >   conclusive.
 >
 >   Also, is this happening on real i386, or in a 32-bit chroot on an
 >   amd64? Might also be a problem with the compat32 sysctl().
 
 I have reproduced this on NetBSD 7.1 on a real i386 machine.
 
 The problem appears to be a compiler bug. Consider the following code, 
 from the middle of netsnmp_cpu_arch_load:
 
  Â Â Â Â Â Â Â  for (i = 0; i < cpu_num; i++) {
  Â Â Â Â Â Â Â Â Â Â Â  netsnmp_cpu_infoÂ  *ncpu = netsnmp_cpu_get_byIdx( i, 1 );
  Â Â Â Â Â Â Â Â Â Â Â  size_t j = i * CPUSTATES;
  Â Â Â Â Â Â Â Â Â Â Â  ncpu->user_ticks = (unsigned long long)ncpu_stats[j + CP_USER];
  Â Â Â Â Â Â Â Â Â Â Â  ncpu->nice_ticks = (unsigned long long)ncpu_stats[j + CP_NICE];
  Â Â Â Â Â Â Â Â Â Â Â  ncpu->sys2_ticks = (unsigned long long)ncpu_stats[j + 
 CP_SYS]+cpu_stats[j + CP_INTR];
  Â Â Â Â Â Â Â Â Â Â Â  ncpu->kern_ticks = (unsigned long long)ncpu_stats[j + CP_SYS];
  Â Â Â Â Â Â Â Â Â Â Â  ncpu->idle_ticks = (unsigned long long)ncpu_stats[j + CP_IDLE];
  Â Â Â Â Â Â Â Â Â Â Â  ncpu->intrpt_ticks = (unsigned long long)ncpu_stats[j + 
 CP_INTR];
  Â Â Â Â Â Â Â  }
 
 This is translated into the following block of code (disassembled by 
 gdb). The block is entered via a conditional branch from elsewhere, if 
 cpu_num > 0.
 
  Â Â  0xbba64c88 <+1039>:Â  movlÂ Â  $0x1,0x4(%esp)
  Â Â  0xbba64c90 <+1047>:Â  movlÂ Â  $0x0,(%esp)
  Â Â  0xbba64c97 <+1054>:Â  callÂ Â  0xbba09460 <netsnmp_cpu_get_byIdx@plt>
  Â Â  0xbba64c9c <+1059>:Â  movÂ Â Â  (%edi),%edx
  Â Â  0xbba64c9e <+1061>:Â  movÂ Â Â  0x4(%edi),%ecx
  Â Â  0xbba64ca1 <+1064>:Â  movÂ Â Â  %edx,0x2008(%eax)
  Â Â  0xbba64ca7 <+1070>:Â  movÂ Â Â  %ecx,0x200c(%eax)
  Â Â  0xbba64cad <+1076>:Â  movÂ Â Â  0x8(%edi),%edx
  Â Â  0xbba64cb0 <+1079>:Â  movÂ Â Â  0xc(%edi),%ecx
  Â Â  0xbba64cb3 <+1082>:Â  movÂ Â Â  %edx,0x2010(%eax)
  Â Â  0xbba64cb9 <+1088>:Â  movÂ Â Â  %ecx,0x2014(%eax)
  Â Â  0xbba64cbf <+1094>:Â  movÂ Â Â  0x10(%edi),%edx
  Â Â  0xbba64cc2 <+1097>:Â  movÂ Â Â  0x14(%edi),%ecx
  Â Â  0xbba64cc5 <+1100>:Â  addÂ Â Â  0x54(%esp),%edx
  Â Â  0xbba64cc9 <+1104>:Â  adcÂ Â Â  0x58(%esp),%ecx
  Â Â  0xbba64ccd <+1108>:Â  movÂ Â Â  %edx,0x2068(%eax)
  Â Â  0xbba64cd3 <+1114>:Â  movÂ Â Â  %ecx,0x206c(%eax)
  Â Â  0xbba64cd9 <+1120>:Â  movÂ Â Â  0x10(%edi),%edx
  Â Â  0xbba64cdc <+1123>:Â  movÂ Â Â  0x14(%edi),%ecx
  Â Â  0xbba64cdf <+1126>:Â  movÂ Â Â  %edx,0x2030(%eax)
  Â Â  0xbba64ce5 <+1132>:Â  movÂ Â Â  %ecx,0x2034(%eax)
  Â Â  0xbba64ceb <+1138>:Â  movÂ Â Â  0x20(%edi),%edx
  Â Â  0xbba64cee <+1141>:Â  movÂ Â Â  0x24(%edi),%ecx
  Â Â  0xbba64cf1 <+1144>:Â  movÂ Â Â  %edx,0x2020(%eax)
  Â Â  0xbba64cf7 <+1150>:Â  movÂ Â Â  %ecx,0x2024(%eax)
  Â Â  0xbba64cfd <+1156>:Â  movÂ Â Â  0x18(%edi),%edx
  Â Â  0xbba64d00 <+1159>:Â  movÂ Â Â  0x1c(%edi),%ecx
  Â Â  0xbba64d03 <+1162>:Â  movÂ Â Â  %edx,0x2038(%eax)
  Â Â  0xbba64d09 <+1168>:Â  movÂ Â Â  %ecx,0x203c(%eax)
  Â Â  0xbba64d0f <+1174>:Â  movÂ Â Â  -0x258(%ebx),%eax
  Â Â  0xbba64d15 <+1180>:Â  movÂ Â Â  (%eax),%eax
  Â Â  0xbba64d17 <+1182>:Â  cmpÂ Â Â  $0x1,%eax
  Â Â  0xbba64d1a <+1185>:Â  jleÂ Â Â  0xbba64ace <netsnmp_cpu_arch_load+597>
  Â Â  0xbba64d20 <+1191>:Â  movlÂ Â  $0x1,0x4(%esp)
  Â Â  0xbba64d28 <+1199>:Â  movlÂ Â  $0x1,(%esp)
  Â Â  0xbba64d2f <+1206>:Â  callÂ Â  0xbba09460 <netsnmp_cpu_get_byIdx@plt>
  Â Â  0xbba64d34 <+1211>:Â  movÂ Â Â  0x28(%edi),%edx
  Â Â  0xbba64d37 <+1214>:Â  movÂ Â Â  0x2c(%edi),%ecx
  Â Â  0xbba64d3a <+1217>:Â  movÂ Â Â  %edx,0x2008(%eax)
  Â Â  0xbba64d40 <+1223>:Â  movÂ Â Â  %ecx,0x200c(%eax)
  Â Â  0xbba64d46 <+1229>:Â  movÂ Â Â  0x30(%edi),%esi
  Â Â  0xbba64d49 <+1232>:Â  movÂ Â Â  0x34(%edi),%edi
  Â Â  0xbba64d4c <+1235>:Â  movÂ Â Â  %esi,0x2010(%eax)
  Â Â  0xbba64d52 <+1241>:Â  movÂ Â Â  %edi,0x2014(%eax)
 
 The branch to 0xbba64ace is a branch back to continue the normal 
 execution of the code, where free(...) is called and life carries on.
 
 Note that the compiler appears to have partially unrolled the loop. But 
 this is the end of that block of code. The next block of code happens to 
 be the cleanup code sysctl(mem_mib, ...) failing, which logs "sysctl 
 vm.vm_meter failed". This appears to be purely coincidental, and the 
 real failure here is that execution just falls off the end of this 
 half-finished loop unrolling.
 
  Â Â  0xbba64d58 <+1247>:Â  callÂ Â  0xbba0abf0 <__errno@plt>
  Â Â  0xbba64d5d <+1252>:Â  movÂ Â Â  (%eax),%eax
  Â Â  0xbba64d5f <+1254>:Â  movÂ Â Â  %eax,0x8(%esp)
  Â Â  0xbba64d63 <+1258>:Â  leaÂ Â Â  -0x41e78(%ebx),%eax
  Â Â  0xbba64d69 <+1264>:Â  movÂ Â Â  %eax,0x4(%esp)
  Â Â  0xbba64d6d <+1268>:Â  movlÂ Â  $0x3,(%esp)
  Â Â  0xbba64d74 <+1275>:Â  callÂ Â  0xbba0af70 <snmp_log@plt>
  Â Â  0xbba64d79 <+1280>:Â  jmpÂ Â Â  0xbba649cd <netsnmp_cpu_arch_load+340>
 
 It does look like a machine with only one CPU would be spared this fate 
 as it would exit the loop after the first iteration and not try to 
 execute the second, incomplete, iteration. This problem should be 
 reproducible on any NetBSD/i386 machine with at least 2 CPUs.
 
 Obviously in the short term, the package will need to work around this 
 by disabling optimisation, but this is clearly something the compiler is 
 getting wrong.

Prev by Date: PR/50939 CVS commit: pkgsrc/net/net-snmp
Next by Date: pkg/52597: devel/libuv broken after update on NetBSD 7.x
Previous by Thread: PR/50939 CVS commit: pkgsrc/net/net-snmp
Next by Thread: Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to crash
Indexes:

Home | Main Index | Thread Index | Old Index