Re: Machine livelock with latest (4.99.48) kernel on sparc64

To: Rafal Boni <rafal%pobox.com@localhost>
Subject: Re: Machine livelock with latest (4.99.48) kernel on sparc64
From: Tobias Nygren <tnn%NetBSD.org@localhost>
Date: Sun, 6 Jan 2008 23:11:16 +0100

On Sat, 05 Jan 2008 20:12:12 -0500
Rafal Boni <rafal%pobox.com@localhost> wrote:

> Rafal Boni wrote:
> > Rafal Boni wrote:
> >> I just rebooted my trusty Netra T1 with a shiny new 4.99.48 kernel and
> >> thought I'd kick off a userland build.  Things seemed to go swimmingly
> >> for a few minutes, then the machine ground to an un-usable state --
> >> userland seems to be mostly non-responsive, though the machine is
> >> pingable, answers a ^T at a tty (well, it seems to be wedged harder
> >> now.. it did for a while after the apparent lockup), and the disk sounds
> >> like progress is being made on the build.
> >>
> >> But, I can't get any echo from a tty anymore, and god forbid I should
> >> want to log in ;)
> >>
> >> Anyone seeing anything similar?  Should I go back to the last-known-good
> >> kernel for a while? ;)
> >>
> >> Machine is a Netra T1 200 -- UltraSPARC-IIe @ 500 MHz with 512MB RAM.
> 
> So I thought I'd give it one more try, and I saw the same thing happen 
> this time with a kernel build (thought I'd see if I maybe there was 
> something else in the latest CVS that would help).
> 
> The machine locked up ~ 18:01; it's now 2+ hours later and the disk is 
> still chugging along.  Here's the last thing 'top' on the console said 
> before the hang:
> 
> load averages:  4.95,  4.71,  3.82                  up 0 days, 13:48 
> 18:01:34
> 29 processes:  1 runnable, 27 sleeping, 1 on processor
> CPU states:  0.0% user,  0.0% nice,  8.1% system,  3.4% interrupt, 88.5% 
> idle
> Memory: 184K Act, 336K Inact, 6096K Wired, 128K Exec, 328K File, 304K Free
> Swap: 2050M Total, 36M Used, 2014M Free
> 
> Unless top's reporting is just way off (it didn't seem to be at the 
> start), there's a sucking memory leak somewhere -- where'd the other 500 
> MB of memory go?
> 
> DDB's ps/l (as well as backtrace) also shows an interesting fact -- the 
> active LWP is the system idle loop every time I'd ended up in DDB due to 
> this hang.
> 
> --rafal
> 

I can reproduce this, with a LOCKDEBUG kernel. I don't have any swap
enabled so instead of death by vm thrashing I get killed processes.
I guess this could be a problem with the sparc64 atomic ops, since
the uvmexp accounting doesn't add up.

Follow-Ups:
- Re: Machine livelock with latest (4.99.48) kernel on sparc64
  - From: Rafal Boni
- Re: Machine livelock with latest (4.99.48) kernel on sparc64
  - From: Andrew Doran

References:
- Machine livelock with latest (4.99.48) kernel on sparc64
  - From: Rafal Boni
- Re: Machine livelock with latest (4.99.48) kernel on sparc64
  - From: Rafal Boni
- Re: Machine livelock with latest (4.99.48) kernel on sparc64
  - From: Rafal Boni

Prev by Date: Re: Re: [joel%carnat.net@localhost: could not load wpi firmware]
Next by Date: Re: [joel%carnat.net@localhost: could not load wpi firmware]
Previous by Thread: Re: Machine livelock with latest (4.99.48) kernel on sparc64 -- mem leak?
Next by Thread: Re: Machine livelock with latest (4.99.48) kernel on sparc64
Indexes:

Home | Main Index | Thread Index | Old Index