Subject: Re: Panic killing a process
To: Julio M. Merino Vidal <jmmv@menta.net>
From: Greg 'groggy' Lehey <grog@NetBSD.org>
List: current-users
Date: 01/08/2005 12:05:32
--NXxBKFTfdeXlAk+N
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thursday,  6 January 2005 at 23:02:16 +0100, Julio M. Merino Vidal wrote:
> Hi all,
>
> I just noticed a 'famd' process running on my box consuming ~100% of CPU.
> When I did 'kill <pid>' (as a regular user), the system completely froze
> (well, it entered ddb, because I typed 'sync' and it properly saved a
> core).  I was under X, so I couldn't see anything else.
>
> Now, analyzing the kcore, I see the following:
>
> (gdb) bt
> #0  0x1fefc000 in ?? ()
> #1  0xc02a0942 in cpu_reboot (howto=3D256, bootstr=3D0x0)
>     at /usr/src/sys/arch/i386/i386/machdep.c:751
> #2  0xc01f3824 in db_sync_cmd (addr=3D1, have_addr=3D0, count=3D-10707538=
26,
>     modif=3D0xcbe5db44 "\200\210=3D=C0[=DB=E5=CB\001") at /usr/src/sys/dd=
b/db_command.c:750
> #3  0xc01f3273 in db_command (last_cmdp=3D0xc039e53c, cmd_table=3D0xc0336=
620)
>     at /usr/src/sys/ddb/db_command.c:464
> #4  0xc01f2f86 in db_command_loop () at /usr/src/sys/ddb/db_command.c:255
> #5  0xc01f6078 in db_trap (type=3D6, code=3D0) at /usr/src/sys/ddb/db_tra=
p.c:101
> #6  0xc029e0aa in kdb_trap (type=3D6, code=3D0, regs=3D0xcbe5dd88)
>     at /usr/src/sys/arch/i386/i386/db_interface.c:225
> #7  0xc02a87b0 in trap (frame=3D0xcbe5dd88)
>     at /usr/src/sys/arch/i386/i386/trap.c:270
> #8  0xc010ae6f in calltrap ()
> #9  0xc021318d in fdfree (p=3D0xca72ce5c)
>     at /usr/src/sys/kern/kern_descrip.c:1284
> #10 0xc0217141 in exit1 (l=3D0xca72ba50, rv=3D15)
>     at /usr/src/sys/kern/kern_exit.c:267
> #11 0xc0226052 in postsig (signum=3D15) at /usr/src/sys/kern/kern_sig.c:1=
849
> #12 0xc02a8a74 in trap (frame=3D0xcbe5dfa8) at /usr/src/sys/sys/userret.h=
:93
>
> Any idea?  Anyone want to debug?

You could start by looking at what's going on in frame 9.  Look at the
local variables.  On the face of it I'd guess a null pointer
dereference, but you should have messages from trap() telling you what
happens.

You might also like to take a look at
http://www.lemis.com/grog/Papers/Debug-tutorial/slides.pdf and
http://www.lemis.com/grog/Papers/Debug-tutorial/tutorial.pdf.

Greg
--
See complete headers for address and phone numbers.

--NXxBKFTfdeXlAk+N
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFB3zjkIubykFB6QiMRArLzAKCNYdvShJO4VyTLmo9VlIwTJ62r2ACfS2Up
3VHaFQNYTmQK6KtUZmkLYLo=
=1VVL
-----END PGP SIGNATURE-----

--NXxBKFTfdeXlAk+N--