Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Severe deadlock issues with 5.0/MP



I have a 6-way E3000 I was hoping to run NetBSD on with the recent SMP successes, so I recently installed a 5.0_BETA built on Jan 24, just shortly before the RC1 tag hit the tree it seems.

The system is running off of a 2x33G RAID1 Raidframe set, ffs logging is not yet enabled.

In order to test the MP robustness, I started building the latest RC1 sources like so:

./build.sh -j 8 [ -M ... ] tools kernel=GENERIC.MP distribution > /tmp/mk.log 2>&1 &

..and sat back to watch the progress with tail -f. Before building gcc finished (still at building "tools"), all output just stopped. Pressing ^C on the "tail" just yielded "^C" in the terminal window (ssh session) but no effect, and after that the terminal session was completely dead.

To see whether this was a general issue or some race caused only by concurrent loads, I hooked up a serial console (so I could catch the output of a panic, if any, and maybe drop into ddb) and restarted the build without -j. The result was the same on the serial console however; after a few minutes of building, all output stops. Pressing return a few times did scroll the console contents, but when ^C:ing the "tail", I only get a ^C on the console, no prompt, and beyond that point the box is stone dead. Doesn't even respond to serial console break. Looks almost like a complete deadlock with interrupts blocked or something. No panic occurs.

Is anyone else having success with MP-kernels, or seeing similar problems?

I'll try a DIAGNOSTICS/LOCKDEBUG kernel and see what gives -- is there any other way I can get useful debugging out of it? (e.g. developer access to the serial console could be arranged; I just won't have on-the-fly access to power-cycle it if it hangs, due to its location)

Best regards,
ali:)


Home | Main Index | Thread Index | Old Index