Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RC3 LOCKDEBUG panic on 6-way E3k.



On Sun, 22 Mar 2009, Martin Husemann wrote:

On Sun, Mar 22, 2009 at 03:27:40PM +0100, Anders Lindgren wrote:
  I got an RC3 DIAGNOSTICS+DEBUG+LOCKDEBUG kernel running with changes
suggested by Martin. Foor good measure, I eliminated RAIDframe from the
picture by booting a second install from a different disk and started a
build.sh -j8 release-build on it to kill it. Rather than deadlock hard
within 10 minutes, it now survived 37 minutes -- but then it panicked! But
now I have ddb!

It's running out of mmu contexts on one of the cpus - and something goes
wrong in the code supposed to recover from that (not realy a heavily tested
code path). I'll have to read the code again and try to see if can reproduce
it with aritificially limited number of contexts quicker on a local machine.

Are we talking about ASI leakage here? Dunno about USII, but USI has (iirc) 4k ASIs.. I haven't looked into how they're handled, but if they're just used round-robin (can't see a reason to do otherwise?), it should be impossible to run out of them unless there are more than 4k concurrent processes?

I see your kmutex_init patch made it into RC4, and removed my local modification. However, something fishy appears to have sneaked into RC4 too; I built a new RC4 LOCKDEBUG kernel, but it never gets to the ASI leakage bug -- it ddb:s trying to read address 0x40 in an openfirmware() call from OF_read, coming from pcons_poll. This happens at the "filesystem type (generic)?" question in the boot -a dialogue, right after answering the root- and dump-device questions.

On a different note: I'm looking into getting a remotely controlled relay to the power cord of this E3k box so I can remotely power cycle it. Does anyone know if this could cause damage to the box (as opposed to power cycling it with the key)? It's no worse than a regular power outage, but I'm not sure how healthy that really is. Manuals tend to not recommend it.

/ali:)


Home | Main Index | Thread Index | Old Index