current-users: Re: LONG - Re: /rescue, crunchgen'ed?

Subject: Re: LONG - Re: /rescue, crunchgen'ed?
To: Greywolf <greywolf@starwolf.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: current-users
Date: 08/30/2002 13:11:20
On Fri, 30 Aug 2002, Greywolf wrote:

> On Fri, 30 Aug 2002, Richard Earnshaw wrote:
>
> I think this is stemming from the point that the whole tree of this
> discussion -- /rescue, and dynamic linking /bin and /sbin (including
> init) -- is pointing out that the way we have it now consists of fewer
> points of failure than the proposed direction we are going (which is
> why I'm grateful for the option to rebuild into the current situation).
>
> I think the init thing is more for localisation, really, since I don't
> see network authentication being possible from init (since it would have
> to plumb the network first, and that doesn't happen in single-user mode).

You assume all auth modules will be using the network. One of the biggies
I have in mind is something that would use dedicated hardware. Like
securecards or some other thing. There you're talking to a local device,
which will be around. While probably not super-common, these are the kinds
of things that get added as site-mandates (i.e. if the site decides to use
it, they tend to require ALL boxes to use it).

> If /bin itself is corrupted, then, yes, okay, we have a small problem :).
> It's a bit less of a miss - and less likely, I think - for anything IN
> /bin to be corrupted.  But if /rescue (the directory), /rescue (the
> filesystem) or /rescue/bin (the crunched binaries) are corrupted, bye, bye,
> baby.

I at least am comfortable with this change as my experience with disks is
that they work great for a while, and then they die hard. This experience
is both from working with computers, and working in labs that had
disk-drive technology projects. Thus talking about what happens when ONE
disk block goes bad doesn't make a ton of sense; if one goes bad, chances
are many will follow and probably will have by the time we notice. So
while the one bad block case can happen, I find it unlikely, as staying in
that state means no more block failures.

Also, the big-iron sysadmin in me thinks if your root disk is hitting bad
blocks, it's not long for this world. Don't boot from it! Use alternate
media and back it up fast! Read only from it what you have to (i.e.  read
each block only once, for backup).

> If there are no good reasons - not OPINIONS, like mine and Greg's and Mouse's
> and Johnny's, but technical reasons which exceed the level of reason as
> shown by many others (including the aforementioned) - not to do this, then
> let it be done and maybe we can get energies being spent on this to be focused
> on real work like the SMP and SA work which we are, by comparison to the
> entire rest of the UNIX community, WOEFULLY behind on (for whatever reason,
> and I think Frank and Bill Sommerfeld and Nathan are doing decently - I
> don't think they have a lot of help.)

Agreed.

Take care,

Bill