Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD on embedded devices



On Mon, May 15, 2017 at 02:37:17PM +0300, Jukka Marin wrote:
> 
> I'm wondering what would be the best way of system updates.  I would
> like to have two separate system images, one that is active and running
> and another which can be updated.  At boot time, the system would have
> to check which image to boot from.  (Or maybe I could use chroot or
> some such to select the image to use.. or just mount one or the other..
> or use a virtual machine or.. ;-)

You're describing a fairly basic dual-image embedded setup.  Several
of us have done this with NetBSD over the years but, sad to say, I
think the results belong to each of our former employers.

Here is the best description I can give you of what I did once.  I wish
I had been able to open-source it and put it back in NetBSD, but that
really wasn't in the cards.

You won't be able to use our installer to do this; expect to get very
familiar with fdisk, disklabel, installboot, newfs, and friends.  You
can boot the install CD and pop out to a shell (thank goodness we have
not lobotomized our install CD like others to make this hard) or untar
base and etc into a PXE environment to get yourself going.

One key idea that may help you is not mine -- it's due to Matt Thomas.  It is
the idea of, making an *additional* bunch of sets files specifying
exactly and only the parts of the system you need.  Then install those
(embedbase.txt, embedetc.tgz, etc.) rather than base, etc, and so forth.
That will give you a simple, maintainable way to build a NetBSD that
does what you need, is still a coherent whole rather than a pile of
disjoint "system packages", and is much smaller.

-------

Here are some goals:

* Boot quickly
* Maintain multiple complete images to allow quick reversion
* Keep a stable configuration even if system's suddenly powered off
* Working serial console at all stages of boot even with crappy BIOS
* Don't burn out the "disk" by writing it too much

Here's some info on what I did.  It is rather specific to x86 and MBR
system partitioning and would need to be adapted for EFI/GPT or other
firmware and associated label formats.

1) Two complete NetBSD images installed, each described by its own
   partition in the MBR.  Partitioning however also described by a single
   BSD disklabel.

   Once the system's up, the BSD disklabel determines the partitioning.
   But at boot time, BIOS uses the MBR table to figure out which image
   to boot and find the first stage boot loader.  You'll need some
   scripting to ensure these are always in perfect sync.

   Since there's a single BSD disklabel (as opposed to, say, one per
   MBR partition like FreeBSD does it) it's natural and easy for the
   system images in the two MBR partitions to share temp/scratch space,
   access each others' contents for debug or upgrade, etc.  You can even
   boot the system using the boot blocks from image A but the kernel and
   filesystem of image B, if the image is really screwed up.

2) Now think of images "A" and "B", your two system images.  The
   correspondence goes like this.  The first MBR partition (partition 0
   in our fdisk's terminology) is NetBSD wd0a and system image "A" and
   we'll make a directory /A where it's always available when mounted,
   no matter whether image A or B is running.  The second MBR partition
   (NetBSD fdisk #1) is wd0b, system image "B" and is always at /B if
   it is mounted (the inactive image might not be mounted).

3) This brings us to the NetBSD fstab.  A system running from "A" is going
   to have:

	/dev/wd0a on / 	(ro) <--- there is a symlink to / at /A in this FS
	/dev/wd0f on /etc (rw,nodev,nosuid)
	tmpfs on /var and /tmp; ptyfs on /dev/pts (rw,nodev,noexec)
	/dev/wd0h on /var/crash (rw,noexec,nodev)

	*If* we are using the "B" image (upgrading, debugging, etc.)
	then we will have /dev/wd0b on /B and /dev/wd0g on /B/etc.  We'll
	also have a "swap" partition at /dev/wd0e marked as 'dp' but not
	'sw' in the fstab so we can take crash dumps but will never swap.

   If running from image "B", it looks like:

	/dev/wd0b on /  <--- there is a symlink to / at /B in this FS
	/dev/wd0g on /etc
	tmpfs on /var and /tmp; ptyfs on /dev/pts
	/dev/wd0h on /var/crash <---- note the sharing! wd0e is also shared.

   An important note is that your scripting for upgrades, etc. can easily
   find out whether it's running in image "A" or "B" just by looking to
   see whether /A or /B is a symlink to "/", without needing to know disk
   device names etc.
 
4)  If you're following all this you're going to wonder what's with the
    separate mount on /etc.  Mounting /etc separately and r/w avoids the
    nasty business many vendors' embedded images do with mounting a
    memory filesystem and "unpacking" a config database (often basically
    just a tar file that's been dd'ed out to a separate disk partition)
    into it, etc.

    If you're going to mount a separate filesystem on top of the /etc
    that came in your "/", then ask yourself why monkey around with some
    complicated scheme involving a tmpfs?  You've already bought yourself
    the pain of needing to synchronize a minimal "boot-time /etc" that
    lives in the normally r/o "/" and the full "run-time /etc" so why
    suffer even more with some tmpfs/tar/unpack scheme?

    Here is a sad thing though about NetBSD from an embedded point of
    view (also true for other vendors' embedded Unices/Linuxes, I fear):
    /etc doesn't really contain just configuration data.  Since we keep
    executable scripts in there, it can't be mounted "noexec", so you
    can't easily achieve a system configuration that's W^X for executable
    code.  This could be fixed with enough effort by moving rc.d and
    friends elsewhere in /, though there's plenty of mischief to be
    wrought by an attacker tampering with configuration instead.  It is
    worth thinking about though as an area for future work in system
    integrity protection, if you care about such things.

    An implementation wart: at system shutdown, my scheme archives
    /var into /etc/var.tgz and at system startup, unpacks it, so /var
    can be a tmpfs.  An mtree file or something else would be cleaner
    but there does seem to be some expectation of persistence for data
    written to /var at runtime in most workloads.

    A note again: you'll need some special scripting to synchronize
    the few settings that matter (root password, mostly) between the
    runtime /etc and the /etc found on the boot filesystem.  This can
    be tricky to get right; ensure you test it (not knowing your root
    password when you need it to get init to let you in single-user is
    a terrible way to find out you needed to test more).

6)  Booting the system is pretty simple but you want a few things set
    up in an extremely precise way.

    A) DO NOT configure the NetBSD MBR bootstrap for serial mode.
       Inconsistent BIOS behavior will drive you bonkers if you do this.

       Instead, configure the BIOS to perform serial console redirection
       and select "Always" or whatever option tells it to keep going even
       after it thinks the OS has started, and let that cause the MBR's
       "console" output to go out the serial port.

    B) Install the NetBSD MBR bootstrap in "boot selector" mode.  Now you
       can press "1" to get image "A" or "2" to get image B.  With no input
       the bootstrap will boot whatever's marked as "active" in the MBR.

       So, the last thing your upgrade procedure does is use fdisk -a to
       change which image is marked as "active".

    C) DO configure the primary boot loader for serial mode.  Explicitly
       configure it with the correct port name (comX), speed (e.g. 9600)
       and the I/O address for that port (e.g. 0x3f8).  If you don't
       expressly configure the ioaddr, the BIOS can do stupid things like
       hide one of the "com" ports out from under you and NetBSD's second
       stage bootloader and kernel will get very confused.

7)  Replace /etc/rc in the skeleton /etc that is on the / filesystem (wd0a
    or wd0b) with a trivial script that fscks and mounts your /etc filesystem
    and transfers control to the real /etc/rc by doing an "exec /bin/sh
    /etc/rc" after the mount.  Be sure to handle fsck's error cases.

    Understand that doing this this way means init(8) has a different view
    of /etc's contents than the whole rest of the system.  There are a
    number of approaches for dealing with this; you can use the other image,
    or you can drop a flag file or log of the changes you wanted somewhere
    the early boot script (real /etc/rc) can find it, then cause a reboot.

    An elegant approach would instead be to change init(8) to, once only,
    reopen all its file descriptors upon receipt of a signal.  This interacts
    with the libc guts, so it may be more work than you think.

8)  Explaining how to install and upgrade a system that behaves as described
    above would take a lot of space and I'm not sure I remember all the
    details perfectly.  However, I suggest that if you get a system configured
    as described above by hand, and then manually install/"upgrade" its sister
    image onto /B once, then run from /B and "upgrade" the same image back
    onto /A, if you take good notes you'll be in a good place to script it.

Whew.  That went on way too long.  I hope it was helpful.  I've been meaning
to write this down somewhere for years.

Thor


Home | Main Index | Thread Index | Old Index