Subject: Re: sysinst report [was: 1.3Beta]
To: Ken Hornstein <kenh@cmf.nrl.navy.mil>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: port-i386
Date: 12/05/1997 13:59:49
>>>6. I messed up a bit and reloaded several times (on the same boot),
>>>   once from the initial menu option -- specifying the disk geometry
>>>   options and several times from the upgrade menu option.  I also
>>>   had an old disklabel in the NetBSD partition when I started.  But I did
>>>   get the system to extract into root & /usr.  When I rebooted, I 
>>>   discovered that my MBR was 0.  THIS SHOULD NEVER BE POSSIBLE.  

[snip]

>>Appalling.

>You know, when doing my most recent install, I almost did this ... and
>I think that I understand part of the problem.

>I have a SCSI controller, and I have translated BIOS geometry.  pfdisk
>and the NetBSD fdisk _report different information_ .... and the NetBSD
>fdisk information is the one you need to use to feed to disklabel so
>it doesn't wipe the MBR.

I've seen something similar with disklabel. With a BusLogic PCI
controller and 1gig and 2gig drives, if it makes a difference.

ISTR there is (or was) a bug in disklabel, which meant it would never
write MBRs correctly on such systems. (The only way to get disklabel
to write a label was by saying yes to the `erase entire disk' prompt,
which promptly clobbered an existing MBR with garbage and left the
system unbootable.  Feh.)

The cause there was different, though, and AFAIK sysinst avoids using
disklabel to write MBRs for that very reason.

Robert Baron points out that the sysint messages call two different
gemoetries the `real' geometry (once in the i386 mbr code, once in the
disklabel code).  So at some point someone was probably confused :).

I do think the problem is *not* lack of error-checking of the MBR
content, it's that a plausible (or halfway plausible) MBR is getting
written to the wrong place on the disk. Or that something else
(disklabelling, maybe?) subsequently clobbers the MBR by using a
`wrong' geometry-- I can't tell which.


>I really don't understand all of the magic here ... anyone have a clue
>what causes this?  I can provide more information if needed.


That might really help.  fvdl is the person to contact, since (AFAIK)
he'll be working on this in the immediate future, and he has a test
environment set up.  Whether it includes SCSI controllers with
controller-level fictitious mappings, i don't know.