current-users: Re: /rescue, crunchgen'ed?

Subject: Re: /rescue, crunchgen'ed?
To: Matthew Orgass <darkstar@pgh.net>
From: Richard Rauch <rauch@rice.edu>
List: current-users
Date: 09/02/2002 10:41:14
> > I'm sure you can find a few machines for which this acttuall matters.
> > Crunchgen may be preferable for them.  They may even want their / to have
> > crunchgen binaries (from the numbers someone listed, I think that that
> > saves even more space than shared, yes?).  People in such "extreme"
> > environments will presumably want or need to do some extra work to get
> > things working anyway; a default setup that they can override (and which
> > doesn't actively impede them installing a working system) seems like the
> > appropriate response, to me.
>
>   The real question is, what do you gain by not crunching them?  If you
> are afraid something will happen to /rescue, copy it.  You should be able
> to make at least five copies in less space than the uncrunched version

How is failover to be accomplished?  Is it automatic?  (I am not familiar
with the boot process to single-user---surely some stuff must run even
before we get to single-user, yes?  If your primary /rescue is hosed, will
the kernel know to failover to the /rescue2, etc?  Or is single-user (up
to the shell) running on the bare kernel and you can just edit your paths
for /bin and /sbin?  Please forgive my ignorance on this point...(^&)

I'd actually feel better by putting the rescue stuff on a seperate
(bootable) partition and never normally mounting it.  This lends itself
directly to multiple copies (even on multiple disks---or at least one on a
distinct disk---for the truly paranoid).  It only depends upon fairly
crude disk structure being maintained/obeyed, and upon having a working
bootloader.


> would take, and this would be five complete copies of everything.  If you
> are worried that it will suddenly stop working, run a boot or cron job to
> test it.  Wasting space for no good reason is never a good policy, no
> matter how much you have available.
>
>   Note also that simply testing new /lib libraries before installing them
> will prevent 90% of the problems that /rescue is needed for (or,
> alternately, keeping a "last known good" copy of the libraries to use in

Well, I'm speaking for my own use, here, and hoping that it scales
somewhat to the general case.  I use NetBSD at home and on a laptop I take
into my office.  I use NetBSD because it behaves logically, and I like the
goals it sets of clean code, etc.  But I am not a serious sysadmin; I've
never been paid two bits, one thin dime, a red cent or even a plugged
nickel, for maintaining my NetBSD systems and network.  I'm an end-user.
(^&

I am not at all concerned about lossage during system updates.  I never
have been.  My approach to system updates tends to: Make an extra backup,
then wipe the disk, install, restore from backup (including relavent
configs) and move forward.

On the other hand, I've never had any failure that would be affected by
anything like /rescue, so this is all somewhat hypothetical to me (hence
my detached point of view in this discussion).


> case the new libraries fail).  This can easily be done with the current
> system.  An easy way to test a new linker on a running system would

I can see that this would be a problem for people tracking -current.
That, I assume, is a minority.  (A significant minority, both in numbers
and importance, worthy of having some special support.  But planning a
feature for a release that is really geared for -current users is perhaps
not the best of ideas.)

Although this thread is taking place on current-users, I get the
impression that /rescue is intended to make its way into a release
someday.  For those of us not running -current, the "update" procedure is
rarely engaged in, can be (and generally should be) done carefully---even
painstakingly---and is almost guaranteed to run as well as the old system.
I have doubts that lossage during updates is a concern for non-current
users.  Concerns about lossage, I assume (though I'm guilty of projecting
my own thoughts here; (^&) relate more to failure of an installed system.
The problem isn't one of being able to "undo" to a safe level, but rather
one of having something robust enough to have a good chance of working
provided that the disk is still physically intact.

Of course, the flip side is: If -current users really *need* something
like /rescue to undo unfortunate updates, there's no harm in letting other
users also get some marginal benefit from it.  But, IMHO, the rest of the
user base will get more benefit from it if it is thought of in terms of
recovery after failure, rather than recovery after update.


Re. your lament that people are discussing ways to improve /rescue: IMHO,
discussing ways to make improvements serves ends of its own.  Clearly,
people feel strongly about this issue in some pockets.  And, people seem
to have firm ideas about what is more stable, and what is less stable.
And, as well, about where failure is more likely to occur---and even over
whether it is better to be more likely to have a "bigger" set of
potentially corrupt blocks that is less likely to be catastrophically
broken (due to redundancy) or a "smaller" set that is less likely to have
anything corrupt, but more likely to be catastrophic if anything is hit.

I made two suggestions; perhaps they are technically inept.  Perhaps I was
wrong to conjure them up, and people who liked them were wrong to like
them.  But from our perspective they would make /rescue better.  A
suggestion that would make things better is surely worth making.  And
explaining why ignoring the suggestion makes sense is, sooner or later,
good for NetBSD's image/advocacy.  (^&  I.e., I believe that the thread
serves a purpose, and should not be disparaged.  At the very least, it
lets people vent a bit; you have to consider human nature in all of this.

Having made my suggestions and stated as best I can their case, I'm not
going to get overly emotional about them, though.  The ideas have been
floated and debated.  Whatever is ultimately done will at least not be
done without having heard (and, I hope, considering) the ideas I offered.


  ``I probably don't know what I'm talking about.'' --rauch@math.rice.edu