Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: build.sh soft-halts machine on 4.99.49 kernel, with 4.99.48 userland.



On Friday 11 January 2008, Andrew Doran wrote:
> On Fri, Jan 11, 2008 at 12:36:58PM -0800, Marc Tooley wrote:
> > uname -a:
> >
> > NetBSD shog 4.99.49 NetBSD 4.99.49 (shog) #0: Thu Jan 10 21:01:10
> > PST 2008  root@shog:/v/src-current-build/sys/arch/i386/compile/shog
> > i386
> >
> > The machine itself is:
> >
> > . Pentium D 930 (obviously running in 32-bit mode.)
> > . Intel D945GNT motherboard
> > . 2.5 GB RAM (already run through memtest86+)
> >
> > Kenel is an i386 debug kernel built from sources rsync'd from I
> > believe January 9. Userland was from 4.99.48, which I believe was
> > sync'd and built about two weeks earlier.
> >
> > The symptoms are that the machine will happily churn along doing
> > its thing with a build.sh -j 4 until.. suddenly it'll stop.
> > Interactive bash commands seem to function, until I try to create a
> > new process or a run a non-builtin bash command. Then that session
> > will simply never return.
> >
> > I can hit enter and the keyboard is responsive. Breaking into the
> > kernel debugger gives me a list of processes that are sitting in
> > vm_map.
> >
> > I have no backtraces, as there is no panic. It's just sitting
> > there, responsive to keyboard input and *already running* network
> > login sessions; but nothing new gets done.
> >
> > I was trying to watch vmstat -i but I lost my screen session.
> >
> > Hints appreciated.
>
> What does 'ps axs' or 'ps/w' from the debugger say? What kind of file
> system configuration does the machine have?
>
> Cheers,
> Andrew

Hello Andrew,

I will properly retrieve that information for you this evening. show 
procs from the debugger showed normal process lists. I pared down 
everything hoping to nail down a particular culprit: even shut down 
syslogd. The only thing "running" (or not, since everything was paused) 
was the build, some child cc1, and so on.

The filesystem is standard ffs--on top of a RAIDFrame raid0 striped 
between two 500GB drives (total 1TB or so.) The build is happening on 
the raid0e partition, which is a volume spanning the whole raid0.

I'll double-check the actual filesystem type (v1 or v2.. I think it was 
newfs -o time -O 2) tonight, as it's paused again.

One positive thing: I was able to log in just a moment ago and run a 
single command after about five hours of paused-ness.

ps auxww showed:

load averages:  4.99,  4.97,  4.92  up 0 days,  6:06   03:08:35
18 processes:  17 sleeping, 1 on processor
CPU0 states: 0.0% user, 0.0% nice, 0.0% system,0.0% interrupt, 100% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system,0.0% interrupt, 100% idle
Memory: 1143M Act, 316M Inact,780K Wired,4384K Exec,1443M File,774M Free
Swap: 2048M Total, 2048M Free

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU 
COMMAND
    0 root     125    0     0K   14M schedu/0   2:17  0.00%  0.00% 
[system]
  420 root      85    0   752K 1248K vm_map/0   0:38  0.00%  0.00% cp
  276 root      85    0   756K  876K vm_map/0   0:10  0.00%  0.00% cron
  413 root      85    0   752K 1292K inoded/0   0:06  0.00%  0.00% rm
  302 root      85    0   768K 1372K vm_map/0   0:05  0.00%  0.00% 
screen-4.0.3
  397 root      85    0   752K 1256K inoded/0   0:05  0.00%  0.00% rm
  115 root      85    0   760K 1012K kqread/0   0:04  0.00%  0.00% 
syslogd
  464 root      85    0   752K  828K inoded/1   0:04  0.00%  0.00% rm
  299 root      85    0  2828K 2736K vm_map/1   0:02  0.00%  0.00% bash
  454 root      42    0   756K 1168K CPU/0      0:00  0.00%  0.00% top
  497 root      85    0   768K 3084K select/1   0:00  0.00%  0.00% sshd
  425 root      85    0  2828K 2748K vm_map/0   0:00  0.00%  0.00% bash
  310 root      85    0  2828K 2748K wait/0     0:00  0.00%  0.00% bash
  444 root      85    0  2824K 2728K wait/0     0:00  0.00%  0.00% bash
  259 root      85    0   768K 2060K select/1   0:00  0.00%  0.00% sshd

... and then when I tried another command (that would be 'w') it's 
paused-but-letting-me-type-stuff again.




Home | Main Index | Thread Index | Old Index