Subject: is 1.6 MP code stable?
To: None <port-alpha@netbsd.org>
From: Stephen M. Jones <smj@cirr.com>
List: port-alpha
Date: 11/10/2002 20:34:05
The CS20s I am working with had fresh installs of the 1.6
release with a tailored kernel via the config file w/ MP
enabled. After 3 to 5 hours of regular use (not just
running idle) the machines would hang without displaying
a kernel panic on the console or logged kernel panic. A side
affect was that the ethernet ports were constantly asserting
their ports on the two ethernet switches causing the 5
other machines to be unreachable.
I took the CS20s home and tried two scenarios to attempt to
recreate the hang. With both, I was able to get the machines
to hang:
(I apologise if these seem crude or cruel .. you should see
what users have tried to do in the past!)
ethernet I/O intensive: rcp'ing a 115mb file between both
machines on both ethernet ports continuously.
CPU / disk I/O intensive: A shell script that forked a cat of
the files in /usr to /dev/null. (this kept the proc table
at about 6000-7500 processes and ran the load at about 900.0)
The results in both scenarios the machines would just hang
randomly .. I ran these tests on them to let them hang about
10 times each.
So I compiled a kernel with single CPU support. I've been
running the same tests for about 13 hours now and haven't had
a hang. The only time I did get a panic was when I had exhausted
memory, so I changed my script to keep the load at about 30.0
I have an AS1200 that I've been running 1.6 on with MP support.
its only real task is running TOPS-20 .. when it does crash its
always been an "fpsave ipi didn't" message followed by tlp0
timeouts .. it always requires a hard reset.
A CS20 developer has strongly recommended I run Debian linux .. but
I just can't do that. I can't even think about doing that.
I'm curious if others are using MP and what sorts of results they've
had and what sorts of applications they are using their machines for.