Subject: MBR lossage (was: Success with Compaq Deskpro XL 566 (AMD AM79C974))
To: None <port-i386@netbsd.org>
From: Ken Harrenstien <klh@us.oracle.com>
List: port-i386
Date: 04/01/1999 02:55:25
OK, all fixed and booting 1.3K from the hard drive now.  I thought I'd
mention the details for posterity, because it sure looks to me as if
there's a bug in either the MBR or the programs that set it.  (I
appreciate the suggestions that people sent, even though it turns out I
didn't need to use any of them.)

FWIW the best references I found were:

	www.pcguide.com - Excellent introduction!  Especially the
		sections on BIOS booting and "Major Disk Structures".
	/usr/include/sys/disklabel_mbr.h - for details on
		what fdisk actually mungs.

After browsing through the PC Guide and examining the actual MBR (with
"dd if=/dev/rsd0d count=1 | hexdump -C") I realized that the error
message I was getting -- "Invalid partition table" -- was actually
coming from the **NetBSD** MBR boot code!  The Compaq BIOS had nothing
to do with it.

I verified that this MBR was identical to /usr/mdec/mbr except for the
partition information, which looked like this:

    # fdisk sd0
    ******* Working on device /dev/rsd0d *******
    Warning: BIOS sector numbering starts with sector 1
    parameters extracted from in-core disklabel are:
    cylinders=6703 heads=5 sectors/track=126 (630 sectors/cylinder)
    
    parameters to be used for BIOS calculations are:
    cylinders=1031 heads=65 sectors/track=63 (4095 sectors/cylinder)
    
    Information from DOS bootblock is:
    0: <UNUSED>
    1: <UNUSED>
    2: <UNUSED>
    3: sysid 165 (FreeBSD or 386BSD or old NetBSD)
        start 63, size 4156362 (2029 MB), flag 0x81
            beg: cylinder    0, head   1, sector  1
            end: cylinder 1014, head  64, sector 63
    #

Now, this is the partition table that the NetBSD installation process
gave me by default when I asked for it to use the whole disk.  Which
means it should have worked rather than offending its own MBR boot.

I pulled over and searched the entire "sys" source snapshot tree but
was unable to find the source for /usr/mdec/mbr.

Going on my guess that the MBR code might be fairly stupid, I tried
using fdisk to flesh out partition #0 by simply copying all the numbers
from partition #3, and then making #0 the active partition.  This
worked!  The only difference between partitions #0 and #3 now is in the
flags; #0 has 0x80 and #3 has 0x1.  I'll clean out #3 later, but I'm
curious why its flags were set to 0x81 in the first place?  I imagine
the flags are all defined somewhere, but the place escaped me.

It wasn't until I also grabbed the "sbin" tree that I *finally* found
the MBR code, hidden away under "fdisk".  This is rather confusing; I'd
have expected to find it in sys/arch/i386/stand/ along with the other
i386 bootstrap stuff.  Grumble, grumble -- I suppose there must be some
good reason for violating the separation of generic from arch-specific
code...  whatever.  Anyway, I don't know much about i386 assembler, but
it looks to me as if it's scanning for the bootable partition by
looking for flags of exactly 0x80.  If my guess is correct, it failed
to work because the flags were 0x81, not because the first three
partitions were empty.

So, there may be one or two possible problems:

	(1) MBR code should look for bit 0x80, not value 0x80.  And/or:
	(2) Sysinst shouldn't set flags to 0x81.

Thanks again!
--Ken

(p.s.  Hopefully I'll soon arrive at a point where I can use send_pr
and won't need to clutter this list any more :-)

(p.p.s.  For a supposedly slow and obsolete 66MHz machine, NetBSD makes
it look surprisingly and impressively spiffy!  As a way of returning
the favor, if anyone else thinks this class of everything-but-monitor
SCSI setup was in fact worth $100, I'd be happy to refer you to the
place I found them.)