Subject: kern/24574: fill_kproc2() may do a null pointer de-reference -> panic
To: None <gnats-bugs@gnats.netbsd.org>
From: None <he@netbsd.org>
List: netbsd-bugs
Date: 02/27/2004 13:26:16
>Number:         24574
>Category:       kern
>Synopsis:       fill_kproc2() may do a null pointer de-reference -> panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 27 12:27:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Havard Eidnes
>Release:        NetBSD 1.6ZK 24 Feb 2004
>Organization:
	
>Environment:
System: NetBSD splitter-pine.urc.uninett.no 1.6ZK NetBSD 1.6ZK (SPLITTER-PINE) #4: Wed Feb 25 16:23:29 CET 2004  he@splitter-pine.urc.uninett.no:/usr/obj/sys/arch/i386/compile/SPLITTER-PINE i386
Architecture: i386
Machine: i386
>Description:
	We got a panic on a 4-CPU i386 machine:

uvm_fault(0xcdedb428, 0, 0, 1) -> 0xe
kernel: page fault trap, code=0
Stopped in pid 19581.1 (top) at netbsd:fill_kproc2+0x436:       movl    0(%eax),%eax
db{3}> trace
fill_kproc2(ce0a8d20,d2bcb9a4,288,1,cdedb428) at netbsd:fill_kproc2+0x436
sysctl_doeproc(d2bcbefc,4,808f000,d2bcbef0,0) at netbsd:sysctl_doeproc+0x1a2
sysctl_dispatch(d2bcbef4,6,808f000,d2bcbef0,0) at netbsd:sysctl_dispatch+0x5a
sys___sysctl(ce21b854,d2bcbf64,d2bcbf5c,ca,0) at netbsd:sys___sysctl+0xad
syscall_plain(d2bcbfa8,1f,1f,1f,1f) at netbsd:syscall_plain+0x17e
db{3}> show reg
ds          0x10
es          0x10
fs          0x30
gs          0x10
edi         0xd2bcba7c
esi         0
ebp         0xd2bcb95c
ebx         0xd2bcb9a4
edx         0xcdedb190  usb_all_tasks+0xd6dd538
ecx         0xce0a8d20  usb_all_tasks+0xd8ab0c8
eax         0
eip         0xc03527ee  fill_kproc2+0x436
cs          0x8
eflags      0x10246
esp         0xd2bcb924
ss          0x10
netbsd:fill_kproc2+0x436:       movl    0(%eax),%eax
db{3}> 

	A gdb session with the kernel (no debugging symbols...) gives

(gdb) p fill_kproc2+0x436
$2 = (<text variable, no debug info> *) 0xc03527ee <fill_kproc2+1078>
(gdb) 
(gdb) x/20i fill_kproc2
...
0xc03527e6 <fill_kproc2+1070>:  push   %ecx
0xc03527e7 <fill_kproc2+1071>:  call   0xc0363830 <proc_representative_lwp>
0xc03527ec <fill_kproc2+1076>:  mov    %eax,%esi
0xc03527ee <fill_kproc2+1078>:  mov    (%eax),%eax
...


>How-To-Repeat:
	I have so far not found a way to reliably reproduce this
	problem.

>Fix:
	Don't know, but apparently proc_representative_lwp(p) may
	return NULL, and the code in fill_kproc2() is not set up to
	handle that.  Now, there is a panic() at the end before
	return(NULL) in proc_representative_lwp(), but no check for
	the actual value returned in two other cases (the first where
	p_nlwps == 1, the second where the state is SZOMB).

	For the time being I've instrumented the code in
	proc_representative_lwp() with a couple of KASSERT()s,
	though that is obviously not a fix, just a trigger a tiny bit
	earlier.
>Release-Note:
>Audit-Trail:
>Unformatted: