Subject: Re: Kernel boot 'inappropriate file type'
To: None <port-sparc@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-sparc
Date: 10/14/2001 00:35:46
> I suspect that it might have something with the boot loader's
> knowledge what inode to boot from being hardcoded,

I've never seen that.  What normally happens is:

- OpenFirmware loads sectors 1 through 15 (0 origin) of the selected
  partition and executes them.  (There may be some checksumming or
  something; I'm not sure.)  This is the first-stage bootloader; it is
  put in place by running installboot.

- This code uses OF callbacks to load a set of block numbers, wired
  into it when installboot was run.  After loading them, it executes
  them.  (Again, there may be checksumming &c.)  This is the
  second-stage bootloader; it is put in place by copying it to a file,
  normally in /, but installboot must be rerun for the new version to
  actually get used.

- This program (still using OF callbacks to read the disk) loads the
  kernel, based on things such as the name specified on the boot
  command and the possible presence of (eg) -a.  It knows enough to
  walk the filesystem.

Now, one possibility here is that installboot was mistakenly run such
that the first stage bootloader loads the kernel directly.  I've never
tried this and am not certain it would work at all.  However, even if
it would, that does not match the symptoms; in that case, it would boot
the old kernel even after putting the new one in /netbsd (because it
still loads the old one's blocks).

Another possibility is that you have a boot ROM version that always
uses 6-byte CDBs and therefore cannot access beyond the 1G point on the
disk.  If your second-stage bootloader and kernel are all below the 1G
mark, everything works fine - but if not, the high bits of the sector
number get dropped, and you get a "random" disk block instead of the
one you want.  If this happens to be the beginning of the kernel,
you'll get a bad magic number error.  This theory is consistent with
all the symptoms described.

A third possibility is that the new kernel is ELF and the botoloader is
so old it doesn't grok ELF, only a.out.  This theory is knocked on the
head because the old bootloader is equally incapable of booting the old
kernel when it's in a new file.

The only plausible theory is thus the old-ROM one.

Now, your boot partition may indeed be smaller than 1G and you may
still suffer from this problem.  It's not the block number relative to
the beginning of the partition that matters but the block number
relative to the beginning of the disk.  All relevant blocks must have
absolute sector numbers that fit in 11 bits.  (In theory, you can
achieve this even if your boot partition passes the 1G mark; in
practice, since you can't easily control which blocks get used for a
new file, the only feasible way is to keep that whole partition below
that point.)

> Therefore, the following install-kernel target makes sense

> install-kernel-${MACHINE_NAME}:
>         rm -f /onetbsd
>         ln /netbsd /onetbsd
>         cp netbsd /nnetbsd
>         mv /nnetbsd /netbsd

> It seems like a pretty bizarre way of installing a kernel, but it
> makes sense if the inode # has to be preserved across kernel
> installs.

Except that it doesn't preserve the inumber of /netbsd.  After that
runs, /onetbsd will be the same inumber /netbsd was, and /netbsd will
be a new inode, allocated when cp created /nnetbsd.  I think the point
is to install the kernel with mv, which usees rename(), thereby
ensuring that even if the machine crashes halfway through, there is
_something_, either the old or the new, in /netbsd.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B