Subject: Re: NetBSD2.0/sparc not ready for prime time?
To: Gert Doering <gert@greenie.muc.de>
From: Seth Kurtzberg <seth@cql.com>
List: port-sparc64
Date: 02/04/2005 11:12:02
Gert Doering wrote:

>Hi,
>
>On Fri, Feb 04, 2005 at 10:15:30AM -0700, Seth Kurtzberg wrote:
>  
>
>>IMHO, it's foolish to try to use a new release in a production 
>>environment until it has been out about three months.  Even then, it 
>>might not be ready.
>>    
>>
>
>Well.  This is how I understand the difference between 2.0 and -current
>- "-current" will have all the ugly things in it, while 2.0 is known
>to work (the short version).
>
>[..]
>  
>
>>So, the best advise, for any o/s release, is, watch the list and wait, 
>>or put it on a non-production machine and be one of the testers.  Don't 
>>put it on a production machine yet.
>>    
>>
>
>That's what I did with the pre-2.0 releases: test it thoroughly on 
>my test sparc64 system.  The "production" system wasn't upgrade until
>some time after the release - and I had no crashes on the test system
>(I had no crashes on the production system either, for the longest time
>- but then I upgraded something in userland, and the system started
>crashing on me).
>
>
>OTOH, this nice "it's all my own fault" discussion doesn't really answer 
>the open question "is it a good thing to backport the -current thread 
>fixes (library and kernel) to 2.0, or will it make everything break 
>even worse"...
>  
>
That's not what I meant at all.  I was just commenting on the complexity 
of the system as a whole, and really I only meant to say that the 
developers working on the sparc specific issues are working very hard 
and doing excellent work, and that it is impossible to know in advance 
every configuration that the release will be used in.

I would suspect that something was fixed somewhere, and that caused your 
system to become more unstable than with the pre-release.  It's not a 
matter of fault, and certainly I wasn't implying anything negative about 
you personally.  I'm sorry that you got that impression.  I would not 
say that it is anyone's fault, I would say that the fault lies with the 
unavoidable complexity of a new operating system release.

I can understand why you are frustrated that something came up in the 
release that didn't occur in the pre-release that you tested.  I'm sure 
that the answer will turn out to be that one thing stopped working 
because another thing was changed for a valid reason.  Again, I did not 
mean to imply any insult or anything negative at all about you, or about 
your message.  I was merely saying, and still believe, that the 
complexity of the system is such that a perfect release is impossible.  
Of course, you know that, so perhaps I should say "near perfect" or 
whatever.

There is clearly some interaction here that wasn't anticipated.  In 
fact, judging by the paragraph that follows, you've already identified 
something that you rebuilt that appears to be related to the error.  
That again doesn't mean anything is or isn't your fault, it simply 
implies that something occurred that was not anticipated.

That is again _not_ to imply that there aren't problems, and that they 
don't need to be fixed.  It is merely to say that you have good people 
already expending maximum effort, and, in my experience, NetBSD 
releases, initially, are of much higher quality than other operating 
systems I've used, including expensive SVR4 releases from Sun or HP.

Judging by the activity on the port-sparc and port-sparc64 threads over 
the last couple of weeks, something was missed.  Someone probably made 
what is arguably a mistake.  I don't disagree that the stability of a 
release should regress from the stability of a pre-release.  I'm just 
saying that, at times, it is unavoidable.

My question is always "did you _have_ to upgrade to the new release 
immediately?  Is there something there that is a compelling reason to 
not wait?"  If not, I was merely suggesting that it is more prudent to 
wait a while before moving to the new release.  I believe that this is 
simply a common sense approach.

So I did not at any time to say "it is all your fault," I was simply 
saying that you might save yourself some aggravation by waiting.  I 
apologize if my words did not match what I was intending to say.

>Waiting for advice, I've recompiled libpthread with the patch (went in
>fairly smooth) and -DPTHREAD_MLOCK_KLUDGE, and it doesn't crash
>the machine immediately :-) - now I'll need to wait and see what will 
>happen over the next few days...
>
>gert
>  
>