Subject: RE: NetBSD 1.6 & i386 SMP, ACPI, mly
To: 'John Franklin' <franklin@elfie.org>
From: Conrad T. Pino <Conrad@Pino.com>
List: tech-smp
Date: 01/16/2003 20:03:00
Hi John,

> -----Original Message-----
> From: John Franklin [mailto:franklin@elfie.org]
> Sent: Wednesday, January 15, 2003 20:24
> To: Brett Lymn
> Cc: Manuel Bouyer; Conrad T. Pino; i386 Port; SMP
> Subject: Re: NetBSD 1.6 & i386 SMP, ACPI, mly
>
> On Thu, Jan 16, 2003 at 02:04:12PM +1030, Brett Lymn wrote:
> >
[snip]
> >
> > OK - but does that mean it is production ready?  Don't get me wrong,
> > what we have is fantastic and it is deeply appreciated by me (at the
> > very least ;-) but we must be wary of overselling what we have lest we
> > get labelled with the "piece of crap" token.  I have experienced
> > problems with -current SMP on my i386 boxen.  Normally these are
> > solved by a "cvs update" and rebuild but whereas I can live with that
> > on my home boxes, I doubt if it would be very satisfactory for a
> > production environment - at least any production environment that I
> > have had experience with would not tolerate it.
>
> NetBSD (nor any other open source OS of which I'm aware) does heavy
> regression testing or have a serious suite of tests against which the
> kernel and userland is tested.  There is the /usr/src/regress directory
> which does have some tests, but it's pretty sparse.

Thank you, I didn't know this.

> "Production code" for OS OSes means the code maintainers have stopped
> adding new functionality into the system and started the bug hunt.  The
> community as a whole assists by running the feature frozen code on a
> wide variety of systems in a wide variety of environments for a wide
> variety of purposes and reports any problems, with patches where
> possible.  This generally takes several months of world-wide effort.

This sounds like a description of the commercial industry process.  I am
correct in understanding that this is what the NetBSD community does when
it is attempting to create a "release" or a "STABLE"?

Is there a document that describes existing CVS labels in the NetBSD code
base and the code branch relationships?

> All it really means is that "production code" is "statistically stable."
> Nobody in the NetBSD community can guarantee the code beyond, "it works
> pretty well on my systems, I haven't heard of any major problems."

Thank you, I didn't know this.

> "Statistically stable" and "works well on my systems" still produces
> high quality code for the simple reason that code quality is rated on
> exactly those two criteria: How often does a failure occur and how high
> is the performance on observed systems.

This observation isn't immediately obvious but once stated it becomes so.

> Code in -current is never "statistically stable" as it constantly sees
> new problems.  During the week the MPACPI code was being added com* and
> lpt* interrupts were broken.  Broken interrupts automatically preclude
> code from being placed on production servers.  This breakage was
> expected, though, as the interrupt code was exactly what the MPACPI was
> affecting.  By the end of the week MPACPI was cleaned up and all was
> well again, better even.[1]

This is very useful to me.  The lack of stability in -current rules it
out for my goals.  Do you have opinions on when SMP support might show
up in a stable or release branch?

Please expand on or define MPACPI.  Thank you.

> Still, that week interrupts were broken.  Another week it might be MP
> code or PCI busses, or SCSI adapters.  Next week, for example, you can
> expect some problems with the scheduler when the scheduler activations
> code is folded into -current, but by the end of next week there will be
> some very happy NetBSD hackers as the issues are ironed out (not to
> mention a couple blokes richly deserving of some pints.)

Thank you, I didn't know this and it's very helpful you shared this.

An implied corollary is that attempting to use -current for production
use will create a turbulent life for those that so dare.

I don't expect to do NetBSD development work anytime soon but I am curious
as an end user & application developer, how I can contribute to -current
progress without placing my development goals at high risk?

> If you want -current code that is production ready, you'll need to
> statistically verify it yourself.  This means setting up lots of servers
> and running a suite of tests on them, while simultaneously watching the
> patches on the main truck and selectively adding them.  This can be a
> lot of work, especially since you'll be tempted to fold in new features.
> If a normal, sanctioned release cycle takes months, you can expect your
> work to take at least as long.

This is clearly beyond my resource capabilities.

> All that said, -current is still a remarkably stable system, more so
> than some commercial OS releases of yesteryear.  How high a quality
> system you need for your production environment, how much time and
> resources you're willing or able to devote to it, how hard your
> requirements are... these all determine for you if it's "production
> code."

Yes, "production code" is context dependant.  In my case the key criteria
are stable operation of Oracle 8i and Tomcat 4.  I'm starting with Solaris
8 Intel edition but since Sun revised their free binary license to limit
the maximum CPU count from 8 to 1, I've become interested in open source
UNIX with SMP support.

Linux has a lot of new feature action but I was unhappy with the security
vulnerabilities and got hacked once.  FreeBSD was better but that got
hacked eventually because I didn't update the code.  It's become clear
that I need to become a source code user instead of relying only upon
binary distributions if I plan to stay with an open source OS.

After reviewing Linux, FreeBSD, OpenBSD & NetBSD once more, I've concluded
NetBSD's project goals best fit my needs and I want to see a stable SMP
release.  Given that my focus is on the application side, what can I do to
assist progress on the OS side?

> Good luck!
>
> jf
> [1] See my prior posts on USB & MP on my quirky VIA-based MB.
> --
> John Franklin
> franklin@elfie.org
> ICBM: 3543'56"N 7853'27"W

Thank you,

Conrad Pino