Subject: Re: ffs compatibility added, fsck may complain
To: William Allen Simpson <wsimpson@greendragon.com>
From: Darrin B. Jewell <dbj@netbsd.org>
List: current-users
Date: 03/14/2004 13:30:02
One more thing I noticed after re-reading Perry's scenario:

I would copy the fsck_ffs.repair temporarily into place as
/sbin/fsck_ffs or else mark the filesystems as `do not fsck' in
/etc/fstab until the system reboots and it is safe to upgrade the rest
of your userland.

Darrin

"Darrin B. Jewell" <dbj@netbsd.org> writes:

> I've been meaning to add an option to fsck to downgrade the
> filesystem, but even that won't completely help your blind upgrade
> problem.  The real answer is that booting a broken -current kernel in
> a blind upgrade situation is dangerous, but that happenned months
> ago, and we're past that already.
> 
> So, my current best guess as to a blind upgrade path would
> be something like:
> 
> Modifty a -current fsck_ffs so that it does not try to
> remount a filesystem after it has repaired it.  The following
> patch should do this:
> 
> --- src/sbin/fsck_ffs/main.c.~1.49.~      Sat Jan 17 17:17:07 2004
> +++ src/sbin/fsck_ffs/main.c      Sun Mar 14 11:13:24 2004
> @@ -373,7 +373,7 @@
>                 pwarn("\n***** FILE SYSTEM WAS MODIFIED *****\n");
>         if (rerun)
>                 pwarn("\n***** PLEASE RERUN FSCK *****\n");
> -       if (hotroot()) {
> +       if (0 && hotroot()) {
>                 struct statfs stfs_buf;
>                 /*
>                  * We modified the root.  Do a mount update on
> 
> Compile that fsck_ffs on your existing system by cd'ing
> into src/sbin/fsck_ffs and running:
>   make USETOOLS=no DESTDIR=/
>   cp ./fsck_ffs /root/fsck_ffs.repair
>   make USETOOLS=no DESTDIR=/ cleandir
> 
> Compile a -current kernel.
> Stop all processes you can without removing access to the machine.
> unmount all possible filesytems except for / and /usr
> Copy the new kernel into place.
>   for the root or /usr filesystems, downgrade the mount to read only.
> If the mount downgrades fail, don't try to force them.  Put
> the old kernel back in place and look around for processes that
> are still writing the disk.
> 
> Wave a few dead chickens.  Type sync.  Wait a few seconds.  Type it
> again.  It should return immediately.  If you can, verify that sync
> does not introduce any disk activity.  iostat -x can be useful for
> this.
> 
> Verify that the fileystems are clean and up to date on disk first
> by running
>   /root/fsck_ffs.repair -n -f -b 16 -c 3
> If this reports any required changes to the filesystems, stop.
> Re-mount the filesystem read-write and put the running
> kernel back in place.   Ask here about how to proceed before
> continuing with the upgrade.
> 
> Upgrade the filesystems by running:
>   /root/fsck_ffs.repair -b 16 -c 4
>   on the raw devices of your filesystems.
> 
> Be gentle while doing this.  You don't want the old kernel
> to try to access any newly upgraded superblocks.  Even a read
> only access may cause a panic.  Hopefully, for filesystems
> which are already mounted read-only, it will not need to
> go back to disk to re-read the superblock.
> 
> reboot
> 
> I would test this upgrade path on systems that are not
> in a blind upgrade situation first.  I have not tested
> this upgrade path.
> 
> Good luck.
> 
> Darrin
> 
> William Allen Simpson <wsimpson@greendragon.com> writes:
> 
> > I never saw an answer to Perry (and my and I'm sure many others) 
> > problem with blind updating co-lo space to more recent -current:
> > 
> > "Perry E. Metzger" wrote:
> > > 
> > > Also, the situation is REALLY unfortunate. It means that you're going
> > > to end up with machines mysteriously failing on people without much
> > > recourse in the field if you don't happen to remember the cure. Also,
> > > people needing to blind upgrade boxes in colos will get screwed -- I'm
> > > one of those.
> > > 
> > > Is there any way to either get the kernel to fix this for you during
> > > boot, or to provide a way to fix it in advance so that fsck doesn't
> > > fail during reboot? This is actually pretty important.
> > > 
> > I'm trying to get ready to test -current in preparation for 2.0, but 
> > I'm not sure that everything will be hunky-dory after simply installing 
> > a new kernel, reboot, tar zxpf base.tgz et alia, reboot.
> > 
> > As Perry suggests, is there a way to fix it in advance?
> > -- 
> > William Allen Simpson
> >     Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32