Subject: is 1.6 MP code stable?
To: None <port-alpha@netbsd.org>
From: Stephen M. Jones <smj@cirr.com>
List: port-alpha
Date: 11/10/2002 20:34:05
The CS20s I am working with had fresh installs of the 1.6
release with a tailored kernel via the config file w/ MP
enabled.  After 3 to 5 hours of regular use (not just
running idle) the machines would hang without displaying
a kernel panic on the console or logged kernel panic.  A side 
affect was that the ethernet ports were constantly asserting
their ports on the two ethernet switches causing the 5
other machines to be unreachable.

I took the CS20s home and tried two scenarios to attempt to
recreate the hang.  With both, I was able to get the machines
to hang:

(I apologise if these seem crude or cruel .. you should see
what users have tried to do in the past!)

ethernet I/O intensive:  rcp'ing a 115mb file between both
machines on both ethernet ports continuously. 

CPU / disk I/O intensive:  A shell script that forked a cat of
the files in /usr to /dev/null.  (this kept the proc table 
at about 6000-7500 processes and ran the load at about 900.0)

The results in both scenarios the machines would just hang 
randomly .. I ran these tests on them to let them hang about
10 times each.

So I compiled a kernel with single CPU support.  I've been 
running the same tests for about 13 hours now and haven't had
a hang.  The only time I did get a panic was when I had exhausted
memory, so I changed my script to keep the load at about 30.0

I have an AS1200 that I've been running 1.6 on with MP support.
its only real task is running TOPS-20 .. when it does crash its
always been an "fpsave ipi didn't" message followed by tlp0
timeouts .. it always requires a hard reset.

A CS20 developer has strongly recommended I run Debian linux .. but
I just can't do that.  I can't even think about doing that.

I'm curious if others are using MP and what sorts of results they've
had and what sorts of applications they are using their machines for.