tech-kern: Re: @booted_kernel magic symlink?

Subject: Re: @booted_kernel magic symlink?
To: None <tech-kern@NetBSD.org>
From: Chapman Flack <nblists@anastigmatix.net>
List: tech-kern
Date: 04/27/2006 14:07:24
Steven M. Bellovin wrote:
> Here's a quick-and-dirty hack, very lightly tested.  There's deliberately
> no stop routine to delete the symlink, and it runs before savecore; the
> idea is that if the system crashed, you can save the failed kernel....
[snip]
>	bk=`sysctl machdep.booted_kernel 2>/dev/null | sed 's/.*= //'`
	[snip]
>		ln -s "$bk" "$booted_kernel_flags"

That's very close. The one major point overlooked is that it has to
be ln -s "$kernelfs_mountpoint/$bk" "$booted_kernel_flags" where
kernelfs_mountpoint will need configuration by the human administrator
in rc.conf.  Otherwise, you've just written something that makes the
same information sysctl has available as a symlink, which is missing the
same component that was missing to start with.

minor nits: ln -sf can eliminate the rm; sysctl -n eliminates the sed.


Wolfgang Solfrank wrote:
> Hmm, wouldn't it make much more sense if it would run *after* savecore?
> That way the symlink would most likely point to the kernel that actually
> crashed, not the one that is currently booting...

The question's tricky, because from savecore's behavior I suspect it does
some groveling in the currently booted kernel in order to determine which
device to look at for the core.  I suspect this because on my 2.0 raid
system, which has ld0 but no wd0, if I boot a kernel that's not the one
savecore thinks, savecore logs "/dev/wd0a: Device not configured"
which hints strongly that savecore groveled the wrong kernel and got a
bogus value for the device to open..

So now we have a catch-22 - savecore wants to grovel information out
of the kernel that /is/ running, and then it wants to find the one that
/was/ running.  A quick look at the code confirms that it does call
kvm_nlist twice, once on (what it thinks is) the running kernel, once
on (what it thinks is) the dead one.

The first query should probably be done using ksyms, and in fact if I
read the docs right that would only require *removing*

	if (kernel == NULL) {
		kernel = getbootfile();
	}

from savecore.c!  As discussed already, getbootfile(3) is a misguided
and unimplementable function, and if kernel were just left NULL,
kvm_openfiles and kvm_nlist would DTRT and use ksyms.

The next thing savecore does is get the dump device name from the
running kernel and do a new kvm_openfiles on that, passing (what
should be) the name of the /dead/ kernel.  But what it really
passes is the same kernel name it passed the first time. This I
suspect is a pure glaring bug.  But observe that to DTRT here is
actually hard. You want the name of the kernel that was running.
You would find that name in the dump. But to know the offset in
the dump to find the name, you need the nlist from the kernel you
are looking for. I suspect that's probably why the savecore author
just gave up and DTWT. :/

So in the end I think I do agree (though by a longer chain of
reasoning) with the Solfrank modification. The symlink pointing
to the current kernel is unnecessary (for savecore) because
savecore can and should be modified to use ksyms for that. The
symlink (or some other record) of the last-booted kernel /is/
necessary (though not sufficient without the corresponding mod
to savecore itself), because savecore can't DTRT without it.

And the conclusion is (wrt savecore) that my originally proposed
current-kernel symlink would not meet savecore's needs, while the
Bellovin RC Script (with Solfrank Modification) would, provided
savecore is also fixed to match. And of course the script does
need to include the configurable kernelfs_mountpoint.

-Chap