Subject: Re: "parked" processes?
To: Andrew Doran <firstname.lastname@example.org>
From: Steven M. Bellovin <email@example.com>
Date: 07/06/2007 23:12:02
On Fri, 6 Jul 2007 23:35:42 +0100
Andrew Doran <firstname.lastname@example.org> wrote:
> On Tue, Jun 26, 2007 at 08:40:12AM +0200, Alan Barrett wrote:
> > On Mon, 25 Jun 2007, Steven M. Bellovin wrote:
> > > I just upgraded my laptop to -current from 18 June. Since then,
> > > I've had several instances of (X11) applications hanging. Two of
> > > them, at least, have been blocked on wchan "parked". Has anyone
> > > else seen this?
> Did you also upgrade Firefox or just NetBSD?
> > I have often had firefox hang like that. This was firefox-22.214.171.124
> > from pkgsrc, running on an i386 laptop with NetBSD-4.99.20, compiled
> > on the same laptop with a somewhat older version of -current
> > (probably 4.99.16). According to "ps -s", almost all lwps were
> > "parked", but one was waiting for something else (a socket, I
> > think). Locking the screen (via xlockmore) and/or suspending the
> > laptop seemed to have something to do with triggering the bad
> > state, but I didn't do any scientific tests to verify that
> > hypothesis.
> > I recompiled firefox and almost all its dependencies on the same
> > version of -current that I am running, and haven't seen the problem
> > again. This makes me suspect a binary compatibility issue between
> > 4.99.16 and 4.99.20.
> Not to discount the possibility, but I know of no reason for it to be
> the case. There are two possibilities: it is a new bug in NetBSD, or
> it's a bug in Firefox.
> I haven't seen any behaviour like this recently and as far as I know
> there are no remaining synchronization issues in this area but I'll
> get looking. If someone could make available to me a coredump from an
> app in this state, along with version info (from the system and the
> app) it would be really helpful.
It's still happening, with fresh NetBSD, Firefox, etc. OTOH, for
various reasons I'm starting to wonder if that particular machine is
having hardware problems.
--Steve Bellovin, http://www.cs.columbia.edu/~smb