Subject: bin/11095: /usr/bin/w core dumps in some circumstances
To: None <gnats-bugs@gnats.netbsd.org>
From: None <hag@linnaean.org>
List: netbsd-bugs
Date: 09/28/2000 07:49:22
>Number:         11095
>Category:       bin
>Synopsis:       /usr/bin/w core dumps in some circumstances
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 28 07:55:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Daniel Hagerty
>Release:        1.5_ALPHA2-2000-09-02
>Organization:
	Distinct lack of
>Environment:
	
System: NetBSD neuralgia.linnaean.org 1.5_ALPHA2 NetBSD 1.5_ALPHA2 (NEURALGIA) #90: Mon Sep 11 17:42:25 EDT 2000 hag@neuralgia.linnaean.org:/fs/neuralgia/home/hag/work/hacking/os/netbsd/vlan/workarea/sys/arch/i386/compile/NEURALGIA i386
Architecture: i386

>Description:
	I'm seeing periodic core dumps from "w".  This seems to be
somewhat correlated with one of my users being logged in, but isn't
consistent.  It'll probably be difficult to reproduce in a test
environment.

    This is what who says during my failure window:

hag      ttyE0    Sep 23 23:06
hag      ttyp1    Sep 25 10:46 (w202.z216112045.)
roland   ttyp2    Sep 23 21:00 (adsl-64-160-53-1)
hag      ttyp4    Sep 23 23:06 (neuralgia.linnae)
hag      ttypb    Sep 28 10:36 (w202.z216112045.)
roland   ttypd    Sep 26 05:41 (adsl-64-160-53-1)
roland   ttyq3    Sep 27 22:30 (adsl-64-160-53-1)

    This is a gdb session of `w' from the same time period.

(gdb) run
10:41AM  up 4 days, 13:50, 7 users, load averages: 0.42, 0.29, 0.22
USER  TTY FROM              LOGIN@  IDLE WHAT
hag    E0 -                Sat11PM  2:04 xinit /home/neuralgia/hag/.xinitrc -- 
hag    p1 w202.z216112045. Mon10AM 2days -bash 
roland p2 adsl-64-160-53-1 Sat09PM  5:42 emacs 
hag    p4 neuralgia.linnae Sat11PM  2:33 -bash 
hag    pb w202.z216112045. 10:36AM     3 -bash 
hag    pd adsl-64-160-53-1 Tue05AM     0 /fs/neuralgia/home/hag/work/hacking/os

Program received signal SIGSEGV, Segmentation fault.
0x480af36e in vfprintf ()
(gdb) bt
#0  0x480af36e in vfprintf ()
#1  0x480cfbca in printf ()
#2  0x8049f1a in main (argc=0, argv=0xbfbfd8a8) at w.c:322
#3  0x8048d71 in ___start ()
(gdb) frame 2
(gdb) list 322,326
322			(void)printf("%-*s %-2.2s %-*.*s ",
323			    lognamelen, ep->kp->p_login,
324			    (strncmp(ep->utmp.ut_line, "tty", 3) &&
325			    strncmp(ep->utmp.ut_line, "dty", 3)) ?
326			    ep->utmp.ut_line : ep->utmp.ut_line + 3,
(gdb) print ep->kp
$12 = (struct kinfo_proc2 *) 0x0


    For whatever reason, kp is null, and is being dereferenced.  The
code is written as if kp != NULL is an invariant.

    I will look into this more deeply as my time allows; perhaps
someone knows the problem offhand just by looking at it.  As far as I
know, libkvm is consistent with the rest of userland/kernel build, but
it's always possible that it isn't.

>How-To-Repeat:
	Convince roland mcgrath to log in, and wait for the
appropriate phase of the moon.

>Fix:
	None known right now, just thought I'd get it in.  I'll do
some digging in my copious free time.
>Release-Note:
>Audit-Trail:
>Unformatted: