Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

re: Severe deadlock issues with 5.0/MP



On Tue, 3 Feb 2009, matthew green wrote:


can you try without the tail -f?  there's a bug where fail -f and
something cause hangs...

  That's such a weird failure-mode it's funny. :-D

  I've tried a couple of things recently. Here's a wrap-up:

- Serial console break doesn't work even when the machine is responding, despite working in OFW, so that's not related to the hang; apparently the kernel is ignoring it. I'll have to see why, I was always under the impression that it was, to the contrary, darn close to impossible to *get
it* to ignore it.

- Variants of build.sh [...] tools kernel=GENERIC.MP distribution:

  a) build.sh -j 8 and output to console: hang within minutes.
  b) build.sh      and output to console: hangs after a few hours
  c) build.sh -j 8 > mk.log 2>&1 without tail: same as a)
  d) build.sh      > mk.log 2>&1 without tail: see below

The first run of (d) stopped after a few hours with a zombie process named "(sparc64--netbsd)" (truncated name, but the logfile suggests the command was a sparc64--netbsd-install of some html documentation). I was actually able to ^C the build and restart it with build.sh -u, which seemed to crawl along -- but subjectively very slow; how long is a "distribution" supposed to take on a 400MHz USII, on the ballpark scale? It was going on for several hours despite starting part-way through. This morning I checked on the serial console (which was running "top") and it's output had stopped again. I was again given one keystroke of input before all life signs cease. No ping response.

Is there any chance that this is raidframe && MP-related? If it could be relevant, I'll install a fresh 5.0_RC1 on a single disk and try again. But I suppose I should try a LOCKDEBUG+DIAGNOSTICS kernel first...

Best regards,
ali:)


Home | Main Index | Thread Index | Old Index