Subject: bin/11095: /usr/bin/w core dumps in some circumstances
To: None <gnats-bugs@gnats.netbsd.org>
From: None <hag@linnaean.org>
List: netbsd-bugs
Date: 09/28/2000 07:49:22
>Number: 11095
>Category: bin
>Synopsis: /usr/bin/w core dumps in some circumstances
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Sep 28 07:55:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator: Daniel Hagerty
>Release: 1.5_ALPHA2-2000-09-02
>Organization:
Distinct lack of
>Environment:
System: NetBSD neuralgia.linnaean.org 1.5_ALPHA2 NetBSD 1.5_ALPHA2 (NEURALGIA) #90: Mon Sep 11 17:42:25 EDT 2000 hag@neuralgia.linnaean.org:/fs/neuralgia/home/hag/work/hacking/os/netbsd/vlan/workarea/sys/arch/i386/compile/NEURALGIA i386
Architecture: i386
>Description:
I'm seeing periodic core dumps from "w". This seems to be
somewhat correlated with one of my users being logged in, but isn't
consistent. It'll probably be difficult to reproduce in a test
environment.
This is what who says during my failure window:
hag ttyE0 Sep 23 23:06
hag ttyp1 Sep 25 10:46 (w202.z216112045.)
roland ttyp2 Sep 23 21:00 (adsl-64-160-53-1)
hag ttyp4 Sep 23 23:06 (neuralgia.linnae)
hag ttypb Sep 28 10:36 (w202.z216112045.)
roland ttypd Sep 26 05:41 (adsl-64-160-53-1)
roland ttyq3 Sep 27 22:30 (adsl-64-160-53-1)
This is a gdb session of `w' from the same time period.
(gdb) run
10:41AM up 4 days, 13:50, 7 users, load averages: 0.42, 0.29, 0.22
USER TTY FROM LOGIN@ IDLE WHAT
hag E0 - Sat11PM 2:04 xinit /home/neuralgia/hag/.xinitrc --
hag p1 w202.z216112045. Mon10AM 2days -bash
roland p2 adsl-64-160-53-1 Sat09PM 5:42 emacs
hag p4 neuralgia.linnae Sat11PM 2:33 -bash
hag pb w202.z216112045. 10:36AM 3 -bash
hag pd adsl-64-160-53-1 Tue05AM 0 /fs/neuralgia/home/hag/work/hacking/os
Program received signal SIGSEGV, Segmentation fault.
0x480af36e in vfprintf ()
(gdb) bt
#0 0x480af36e in vfprintf ()
#1 0x480cfbca in printf ()
#2 0x8049f1a in main (argc=0, argv=0xbfbfd8a8) at w.c:322
#3 0x8048d71 in ___start ()
(gdb) frame 2
(gdb) list 322,326
322 (void)printf("%-*s %-2.2s %-*.*s ",
323 lognamelen, ep->kp->p_login,
324 (strncmp(ep->utmp.ut_line, "tty", 3) &&
325 strncmp(ep->utmp.ut_line, "dty", 3)) ?
326 ep->utmp.ut_line : ep->utmp.ut_line + 3,
(gdb) print ep->kp
$12 = (struct kinfo_proc2 *) 0x0
For whatever reason, kp is null, and is being dereferenced. The
code is written as if kp != NULL is an invariant.
I will look into this more deeply as my time allows; perhaps
someone knows the problem offhand just by looking at it. As far as I
know, libkvm is consistent with the rest of userland/kernel build, but
it's always possible that it isn't.
>How-To-Repeat:
Convince roland mcgrath to log in, and wait for the
appropriate phase of the moon.
>Fix:
None known right now, just thought I'd get it in. I'll do
some digging in my copious free time.
>Release-Note:
>Audit-Trail:
>Unformatted: