Subject: odd panic in syn_cache_add
To: None <netbsd-users@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: netbsd-users
Date: 11/24/2003 14:00:07
I just had a box panic suddenly.  The machine is i386, pretty recent
1.6.1 plus some custom changes (in IPsec).  This machine has crashed
before.  This time I got a core dump.

It looks like pool_get managed to exit such that a trap was caught
just before or while the next instruction was being executed.

I enclose backtrace, showing the frame, and a disass of the region,
annotated, with questions.  Any clues appreciated.

  (gdb) bt
  #0  0x1 in ?? ()
  #1  0xc02ad313 in cpu_reboot (howto=256, bootstr=0x0)
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../arch/i386/i386/machdep.c:2242
  #2  0xc01f51d7 in panic ()
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../kern/subr_prf.c:253
  #3  0xc02b41fa in trap (frame={tf_gs = -1059192816, tf_fs = -1058144240, 
	tf_es = -481099760, tf_ds = -1071054832, tf_edi = -481067544, 
	tf_esi = -1069554336, tf_ebp = -481068192, tf_ebx = -1059732888, 
	tf_edx = 0, tf_ecx = -1059729448, tf_eax = -1059733452, tf_trapno = 6, 
	tf_err = 2, tf_eip = -1071695320, tf_cs = 8, tf_eflags = 66178, 
	tf_esp = 0, tf_ss = 0, tf_vm86_es = -481067544, 
	tf_vm86_ds = -1071268017, tf_vm86_fs = 1073741823, tf_vm86_gs = 0})
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../arch/i386/i386/trap.c:231
  #4  0xc0100c7d in calltrap ()
  #5  0xc025bfff in syn_cache_add (src=0xe3537db0, dst=0xe3537dcc, 
      th=0xc10efb4c, hlen=0, so=0xc0ecbc80, m=0xc10efb00, 
      optp=0xc10efb60 "\002\004\201\234\001\003\003", optlen=20, oi=0xe3537e50)
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../netinet/tcp_input.c:3507
  #6  0xc02586d6 in tcp_input (m=0xc10efb00)
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../netinet/tcp_input.c:1374
  #7  0xc0248080 in ip_input (m=0xc10efb00)
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../netinet/ip_input.c:799
  #8  0xc024792f in ipintr ()
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../netinet/ip_input.c:381
  #9  0xc010212c in Xsoftnet ()
  #10 0xc02084d2 in sys_connect (p=0xe351e570, v=0xe3537f80, retval=0xe3537f78)
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../kern/uipc_syscalls.c:277
  #11 0xc02b3d0b in syscall_plain (frame={tf_gs = 31, tf_fs = 145031199, 
	tf_es = -1078001633, tf_ds = 31, tf_edi = 17, tf_esi = 17, 
	tf_ebp = -1077970384, tf_ebx = 136342856, tf_edx = 163625878, 
	tf_ecx = -207, tf_eax = 98, tf_trapno = 3, tf_err = 2, 
	tf_eip = 1210560515, tf_cs = 23, tf_eflags = 659, tf_esp = -1077970444, 
	tf_ss = 31, tf_vm86_es = 0, tf_vm86_ds = 0, tf_vm86_fs = 0, 
	tf_vm86_gs = 0})
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../arch/i386/i386/syscall.c:140

apparently in connect()

  #12 0xc0100d4a in syscall1 ()
  can not access 0xbfbf7a30, invalid translation (invalid PDE)
  can not access 0xbfbf7a30, invalid translation (invalid PDE)
  Cannot access memory at address 0xbfbf7a30

  (gdb) fr 5
  #5  0xc025bfff in syn_cache_add (src=0xe3537db0, dst=0xe3537dcc, 
      th=0xc10efb4c, hlen=0, so=0xc0ecbc80, m=0xc10efb00, 
      optp=0xc10efb60 "\002\004\201\234\001\003\003", optlen=20, oi=0xe3537e50)
      at /home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../netinet/tcp_input.c:3507
  3507            sc = pool_get(&syn_cache_pool, PR_NOWAIT);
  (gdb) i fr
  Stack level 5, frame at 0xe3537ce0:
   eip = 0xc025bfff in syn_cache_add
      (/home/gdt/QUIST-current/netbsd/src/sys/arch/i386/compile/FNORD/../../../../netinet/tcp_input.c:3507); saved eip 0xc02586d6

So does this mean that 0xc025bfff is the next instruction to be
executed, or that was the one running while the fault occurred?

   called by frame at 0xe3537e60, caller of frame at 0xe3537b60
   source language c.
   Arglist at 0xe3537ce0, args: src=0xe3537db0, dst=0xe3537dcc, th=0xc10efb4c, 
      hlen=0, so=0xc0ecbc80, m=0xc10efb00, 
      optp=0xc10efb60 "\002\004\201\234\001\003\003", optlen=20, oi=0xe3537e50
   Locals at 0xe3537ce0, Previous frame's sp is 0x0
   Saved registers:
    ebx at 0xe3537b78, ebp at 0xe3537ce0, esi at 0xe3537b7c, edi at 0xe3537b80,
    eip at 0xe3537ce4

  0xc025bf9c <syn_cache_add+256>: test   %eax,%eax
  0xc025bf9e <syn_cache_add+258>: je     0xc025bfac <syn_cache_add+272>
  0xc025bfa0 <syn_cache_add+260>: add    $0xfffffff4,%esp
  0xc025bfa3 <syn_cache_add+263>: push   %eax
  0xc025bfa4 <syn_cache_add+264>: call   0xc0202cbc <m_free>
  0xc025bfa9 <syn_cache_add+269>: add    $0x10,%esp
  0xc025bfac <syn_cache_add+272>: mov    %esi,0xa4(%ebx)
  0xc025bfb2 <syn_cache_add+278>: mov    0xffffffe0(%ebp),%eax
  0xc025bfb5 <syn_cache_add+281>: mov    %eax,0x50(%ebx)
  0xc025bfb8 <syn_cache_add+284>: add    $0xfffffff8,%esp
  0xc025bfbb <syn_cache_add+287>: mov    0x1c(%ebp),%edx
  0xc025bfbe <syn_cache_add+290>: push   %edx
  0xc025bfbf <syn_cache_add+291>: push   %ebx
  0xc025bfc0 <syn_cache_add+292>: call   0xc025c250 <syn_cache_respond>
  0xc025bfc5 <syn_cache_add+297>: test   %eax,%eax
  0xc025bfc7 <syn_cache_add+299>: jne    0xc025c240 <syn_cache_add+932>
  0xc025bfcd <syn_cache_add+305>: addl   $0x1,0xc03fe718
  0xc025bfd4 <syn_cache_add+312>: adcl   $0x0,0xc03fe71c
  0xc025bfdb <syn_cache_add+319>: addl   $0x1,0xc03fe6f0
  0xc025bfe2 <syn_cache_add+326>: adcl   $0x0,0xc03fe6f4
  0xc025bfe9 <syn_cache_add+333>: jmp    0xc025c240 <syn_cache_add+932>
  0xc025bfee <syn_cache_add+338>: mov    %esi,%esi
  0xc025bff0 <syn_cache_add+340>: add    $0xfffffff8,%esp
  0xc025bff3 <syn_cache_add+343>: push   $0x0
  0xc025bff5 <syn_cache_add+345>: push   $0xc03fe560
  0xc025bffa <syn_cache_add+350>: call   0xc01f37d0 <pool_get>
  0xc025bfff <syn_cache_add+355>: mov    %eax,%ebx

I can't see how this mov would fault.  Could this be an instruction
still in process from pool_get?

  0xc025c001 <syn_cache_add+357>: add    $0x10,%esp
  0xc025c004 <syn_cache_add+360>: test   %ebx,%ebx
  0xc025c006 <syn_cache_add+362>: jne    0xc025c01c <syn_cache_add+384>
  0xc025c008 <syn_cache_add+364>: test   %esi,%esi
  0xc025c00a <syn_cache_add+366>: je     0xc025c015 <syn_cache_add+377>
  0xc025c00c <syn_cache_add+368>: add    $0xfffffff4,%esp
  0xc025c00f <syn_cache_add+371>: push   %esi
  0xc025c010 <syn_cache_add+372>: call   0xc0202cbc <m_free>
  0xc025c015 <syn_cache_add+377>: xor    %eax,%eax
  0xc025c017 <syn_cache_add+379>: jmp    0xc025c245 <syn_cache_add+937>

I looked at all the pointers given to syn_cache_add, and gdb was able
to print them all ok.

I was not intentionally doing anything very odd.  X was running, and I
noticed the mouse freeze.