netbsd-bugs: kern/5624: crash in pppasyncstart tripping over freed mbuf..

Subject: kern/5624: crash in pppasyncstart tripping over freed mbuf..
To: None <gnats-bugs@gnats.netbsd.org>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: netbsd-bugs
Date: 06/19/1998 17:00:47
>Number:         5624
>Category:       kern
>Synopsis:       crash in pppasyncstart tripping over freed mbuf..
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jun 19 10:05:01 1998
>Last-Modified:
>Originator:     Bill Sommerfeld
>Organization:
	
>Release:        19980617
>Environment:
	
System: NetBSD orchard.arlington.ma.us 1.3F NetBSD 1.3F (ORCHARDII) #21: Fri Jun 12 00:05:13 EDT 1998 sommerfeld@orchard.arlington.ma.us:/d3/NetBSD-current/src/sys/arch/i386/compile/ORCHARDII i386


>Description:
	i've seen multiple crashes in pppasyncstart in a system which
is running a ppp connection over an ssh login (poor man's tunnel..);
it's never happened while I was physically present, and the system in
question was rigged to crashdump rather than drop into ddb.

the system is built with options DIAGNOSTIC, which is evidently why we
crashed at *this* particular place (it fills the first 32 bytes of
freed memory with 0xdeadbeef).

The fault as pulled out of the crash dump:

(gdb) where
#0  0x6 in ?? ()
#1  0xf01b4668 in cpu_reboot (howto=0x100, bootstr=0x0)
    at ../../../../arch/i386/i386/machdep.c:1304
#2  0xf012eb8d in panic (fmt=0xf01bd7f6 "trap")
    at ../../../../kern/subr_prf.c:184
#3  0xf01bda69 in trap (frame={tf_es = 0xf0130010, tf_ds = 0xfc3d0010, 
      tf_edi = 0xf0609880, tf_esi = 0xf022dfc4, tf_ebp = 0xfc561d30, 
      tf_ebx = 0x94, tf_edx = 0xdeadbeef, tf_ecx = 0x1, tf_eax = 0xfc3d7000, 
      tf_trapno = 0x6, tf_err = 0x2, tf_eip = 0xf015db24, tf_cs = 0x8, 
      tf_eflags = 0x10202, tf_esp = 0x1f000047, tf_ss = 0xf022e094, 
      tf_vm86_es = 0xf022e098, tf_vm86_ds = 0xc0000000, tf_vm86_fs = 0x1, 
      tf_vm86_gs = 0xf0575648}) at ../../../../arch/i386/i386/trap.c:254

I seem to be missing a few clues as to how to get gdb to actually pull
stuff out of the trap frame and give me a traceback of the fault,
but here's what i can dig out with what i have...

(gdb) list *0xf015db24
0xf015db24 is in pppasyncstart (../../../../net/ppp_tty.c:646).
641                     m->m_len = len;
642                     break;
643                 }
644
645                 /* Finished with this mbuf; free it and move on. */
646                 MFREE(m, m2);
647                 m = m2;
648                 if (m == NULL) {
649                     /* Finished a packet */
650                     break;

disassembling around the fault:

0xf015db1c <pppasyncstart+924>: movl   0x1c(%edi),%edx
0xf015db1f <pppasyncstart+927>: movl   0xf022dc08,%eax
0xf015db24 <pppasyncstart+932>: movl   %eax,(%edx)	<--
0xf015db26 <pppasyncstart+934>: movl   %edx,0xf022dc08
0xf015db2c <pppasyncstart+940>: incl   0xf0234dd4

this appears to be this part of the _MEXTREMOVE() macro in mbuf.h

		char *p = (m)->m_ext.ext_buf; \
		((union mcluster *)(p))->mcl_next = mclfree; \
		mclfree = (union mcluster *)(p); \
		mbstat.m_clfree++; \

I believe `m' is %edi, which points at a block of memory filled at the
start with a bunch of mostly 0xdeadbeef values.  the pattern here is
consistent with it being a former mbuf on the freelist..

0xf0609880 <end+4004576>: 0xdeadbeef 0xdead0001 0xf05fb980 0xdeadbeef
0xf0609890 <end+4004592>: 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xdeadbeef
0xf06098a0 <end+4004608>: 0x00000000 0x00000000 0x00000000 0x828ae080
0xf06098b0 <end+4004624>: 0xf0609880 0xf0609880 0x3c2bcb75 0x4dda868e

i.e is kernel malloc type `1' == M_MBUF; 0xf05fb980 points at another
datum like this one; this is toast when looked at as an mbuf:

(gdb) p *(struct mbuf *)frame.tf_edi
$7 = {m_hdr = {mh_next = 0xdeadbeef, mh_nextpkt = 0xdead0001, 
    mh_data = 0xf05fb980 "...", mh_len = 0xdeadbeef, mh_type = 0xbeef, 
    mh_flags = 0xdead}, M_dat = {MH = {MH_pkthdr = {rcvif = 0xdeadbeef, 
        len = 0xdeadbeef}, MH_dat = {MH_ext = {
          ext_buf = 0xdeadbeef <Address 0xdeadbeef out of bounds>, 
          ext_free = 0, ext_arg = 0x0, ext_size = 0x0, ext_type = 0x828ae080, 
          ext_nextref = 0xf0609880, ext_prevref = 0xf0609880}, 
	...
}

I believe that `p' is %edx, which contains 0xdeadbeef..  

%eax *should* contain the current value of mclfree, however, they
don't match (though this may be a result it getting trashed before
being dumped.

So, it looks like somehow a freed mbuf wound up on the ppp output
queue; i suspect that the diversion into _MEXTREMOVE occured purely
because M_EXT==1, which is a flag set in 0xdeadbeef..

Isn't that lovely..

>How-To-Repeat:
	hard to know what factors do it; it seems to happen every
	couple days when it's least convenient for me to debug it..
>Fix:
	uncertain.  opening PR so state doesn't get lost..
>Audit-Trail:
>Unformatted: