netbsd-bugs: Re: kern/20914: kernel panic in sysctl

Subject: Re: kern/20914: kernel panic in sysctl_procargs()
To: David Laight <david@l8s.co.uk>
From: Andrew Brown <atatat@atatdot.net>
List: netbsd-bugs
Date: 04/08/2003 12:14:38
>Ok, so the break is in this 'memcpy' - which the compiler has inlined
>to a single 'mov' instruction (at sysctl_procargs+0x1fd):

right.

>> /usr/src/sys/arch/i386/compile/CRASH/../../../../kern/kern_sysctl.c:2126
>> 	case KERN_PROC_ARGV:
>> 		/* XXX compat32 stuff here */
>> 		memcpy(&tmp, (char *)&pss + p->p_psargv, sizeof(tmp));
>> c030f590 <sysctl_procargs+0x1f4> 8b 7d bc             	mov    0xffffffbc(%ebp),%edi
>> c030f593 <sysctl_procargs+0x1f7> 8b 87 7c 01 00 00    	mov    0x17c(%edi),%eax
>> c030f599 <sysctl_procargs+0x1fd> 8b 44 28 f0          	mov    0xfffffff0(%eax,%ebp,1),%eax
>> /usr/src/sys/arch/i386/compile/CRASH/../../../../kern/kern_sysctl.c:2127
>> 		break;
>
>Now we don't have the registers of the panic (anyone fancy fixing ddb?)

and i can't get to that frame in gdb.  :-/

>but the above should have extreme difficulty in exploding.

that's what i figured, unless one of the registers was wrong.

>p->p_psargv is a constant - I think it should be 0 for all processes
>and all the time.  The only time it will be wrong (for a pointer that
>has been a valid proc pointer) is if the process has exited and the
>memory page released from the pool back for general use.
>
>Any thoughts?
>
>I don't see any need for p->p_psargv and friends - but I can't quite
>see how the value read can be invalid - even though it must be!
>
>OTOH this code is completly borked should the process actually exit!

it's safe to say my x server wasn't exiting.  i've got the core, the
kernel file, and the netbsd.gdb file, but i can't step back up to the
frame in which that "call" resides.  otoh, given:

uvm_fault(0xc06b71c0, 0xcfcf8000, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 eip c030f599 cs 8 eflags 10246 cr2 cfcf8e50 ilevel 0
panic: trap
Begin traceback...
trap() at trap+0x21a
--- trap (number 6) ---
sysctl_procargs(cfcd8f18,2,807b000,cfcd8f0c,cfcc8500) at sysctl_procargs+0x1fd
kern_sysctl(cfcd8f14,3,807b000,cfcd8f0c,0) at kern_sysctl+0x4b4
sys___sysctl(cfae3688,cfcd8f80,cfcd8f78,c0ac6340,bdbbd2e0) at sys___sysctl+0x1f2
syscall_plain(1f,1f,1f,1f,4) at syscall_plain+0xab
End traceback...

and

sysctl_procargs(int *name, u_int namelen, void *where, size_t *sizep,
    struct proc *up)

i can dump arbitrary bits of "memory", like so:

(gdb) x/2x 0xcfcd8f18
0xcfcd8f18:     0x00000277      0x00000001

that's 0x277 (the x server), 0x1 (KERN_PROC_PID), and since name is a
pointer that gets passed in from kern_sysctl() thusly:

        case KERN_PROC_ARGS:
                return (sysctl_procargs(name + 1, namelen - 1,
                    oldp, oldlenp, p));

(gdb) x/3x 0xcfcd8f14
0xcfcd8f14:     0x00000030      0x00000277      0x00000001

the 0x30 is KERN_PROC_ARGS, and since it, in turn, gets called from
sys___sysctl() like so:

        error = (*fn)(name + 1, SCARG(uap, namelen) - 1, SCARG(uap, old),
            oldlenp, SCARG(uap, new), SCARG(uap, newlen), p);


and since name is a variable local to sys___sysctl() (whose address is
apparently 0xcfcd8f10), i should be able to find the sys___sysctl()
stack frame somewhere around 0xcfcd8f10.  let's see...pointer,
pointer, int, a couple of size_ts, another pointer, an array of ints
called name (CTL_MAXNAME aka 12 in length), and another
pointer...that's 19 x 4 (this is i386), so we look at this:

(gdb) x/48x 0xcfcd8f00
0xcfcd8f00:     0xc030cae0      0x00040000      0xcfcc8500      0x00040000
0xcfcd8f10:     0x00000001      0x00000030      0x00000277      0x00000001
0xcfcd8f20:     0x0000002e      0x00000000      0xcfcd8f68      0xc0b9b000
0xcfcd8f30:     0x0180d00a      0x2f83d00a      0x00000000      0x00000004
0xcfcd8f40:     0xcfcd8fa0      0xc03a1e3b      0xcfae3688      0xcfcd8f80
0xcfcd8f50:     0xcfcd8f78      0xc0ac6340      0xbdbbd2e0      0xbfbfedbc
0xcfcd8f60:     0x00000004      0x00000000      0x00000006      0xc043f464
0xcfcd8f70:     0xcfcd8fa0      0xc0699fb0      0x00000000      0x00000000
0xcfcd8f80:     0xbfbfedbc      0x00000004      0x0807b000      0xbfbfedb8
0xcfcd8f90:     0x00000000      0x00000000      0xc0ac8500      0xc0ac8500
0xcfcd8fa0:     0xbfbfed5c      0xc0100c6b      0x0000001f      0x0000001f
0xcfcd8fb0:     0x0000001f      0x0000001f      0x00000004      0xbfbfedbc

now...the { 0x1, 0x30, 0x277, 0x1 } is the name array (where the
initial 0x1 is presumably CTL_KERN).  going back to the ddb stackdump
that says the fourth argument to sysctl_procargs() was 0xcfcd8f0c,
that must be the address of sizep (known here as oldlen), the 0x40000
value.  since i've dumped the vmspaces of all the processes from the
kernel core, i can say that the ps process was 334 which has a paddr
(according to ps) of cfcc8500.  that value appears at 0xcfcd8f08 in
the dump above.  so...where's the thing that i feed to gdb to step to
that stack frame?

-- 
|-----< "CODE WARRIOR" >-----|
codewarrior@daemon.org             * "ah!  i see you have the internet
twofsonet@graffiti.com (Andrew Brown)                that goes *ping*!"
werdna@squooshy.com       * "information is power -- share the wealth."