port-i386: sysinst report [was: 1.3Beta]

Subject: sysinst report [was: 1.3Beta]
To: Robert.V.Baron <rvb@gluck.coda.cs.cmu.edu>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: port-i386
Date: 12/04/1997 16:35:52
I'm cc'ing this reply to current-users as well as port-i386 since
there are a couple of other ports using sysinst, and most of the
problems below are really MI...

[snip]

>I was sort of disappointed with the sysinit interface and think that
>it needs a bunch of work to make it friendly/useable for niave users.
>Some of these points have been raised earlier but are not yet fixed.
>My comments (somewhat stream of consciousness) follow:

-------------------- trial one

>0. I did not look too hard but I could not find an INSTALL document
>   on ftp.netbsd.org.

It's there.


>1. The program name should be something more understandable
>   than sysinst -- howabout install or menu_install and maybe
>   the existing install can be install.old

This is a separate thread, now; a good name is a good thing, but i
don't think it's really relevant to problems with sysinst per se.


>2. In sysinst, most menu's end with an x (exit) option.  This
>   is really gets you to the next menu so why not call it n (next).


It doesn't always mean `next', often it means `go back to parent'.
I'd prefer to just use 'x' consistently to mean `leave this menu' (
change the `e' in the main menu to be 'x') and have the documentation
clarify what happens where.

Would you  be unhappy with that?


>3. When you net install, you should be allowed to not specify a
>   nameserver or gateway.  Then you would use the ip addresses to
>   ftp or nfs to.

I'm surprised you'd want to type more addresses than necessary, but
whatever.  How about we make it possible to enter 0.0.0.0 for the
gateway/DNS addresses, and let you enter dotted-quads at the other
points?  Would that work, if it was documented?


>4. As someone remarked, if ftp fails the messages blast by so fast
>   that you can not tell what happened.

This has been a *long-standing* problem.  I'm not sure how to bash on
curses to get around it.  Perhaps Phil can look at it this weekend.

Another idea i had is to at least check the .tgz file exists; and to
present a summary after unpacking all sets:
	X tarballs selected,
	Y not found at all, Z found,
	P of the Z unpacked cleanly,  Q of the Z encountered errors.

I think that would catch most problems or at least alleviate them.
Comments?



>5. There should be an option that lets you go out and do the ftp's
>   by hand and then go back in and do the extracts.

You can go to a shell and FTP. My own tree ahs code to do the extracts
_only_, from any of the already-supported choices or an
already-mounted local dir. Would that work for you?


>6. I messed up a bit and reloaded several times (on the same boot),
>   once from the initial menu option -- specifying the disk geometry
>   options and several times from the upgrade menu option.  I also
>   had an old disklabel in the NetBSD partition when I started.  But I did
>   get the system to extract into root & /usr.  When I rebooted, I 
>   discovered that my MBR was 0.  THIS SHOULD NEVER BE POSSIBLE.  The 
>   install should always check to make sure that the mbr looks plausible
>   before it is written.  Luckily, I had the contents written down, so
>   I could use pfdisk and fdisk /mbr to recover.

Appalling.

>6a.
>   It looks like the disklabel never made it back to disk.  When I rebooted
>   the a slice was the old disklabel info, but the a fs was much bigger
>   than the slice, hmmm.

Did you by any chance use a label name containing whitespace?  That
would fail, and it's the only way I've ever persuaded the
disklabel-writing code to fail.  That was fixed some days ago, but may
not be in the dec 1 i386 snapshot.



>7. I also tried a simple install (like in days of old).  It worked fine
>   except there was no hint of were/how I was to go to get a kernel.


kern.tgz, as  mentioned already.


>---------------------- trial two.
>
>   I "cp sysinst /dev/wd0a" to clean things up and restart.
>

[1 and 2 i386-specific geometry probelms  being looked at by fvdl. ]


>3. I choose to specify geometry in cyl's then choose the standard layout.
>   (can't believe netbsd is 500+ meg these days.)  When it shows the layout,
>   it uses meg.  I want cyl so I can make sure I am not breaking anything!

Could you confirm this is the *disklabel* gemoetry, not the MBR
Changing disklabel units seem to work for me on a pmax...



>4. After 3, if you do "change a partition", and go to set new allocation size
>   you can choose cyl and it now sticks.



>5. There are a bunch if errors about removing / from absolute path names in archive.

Hm.  sysinst copies essential binaries from the ramdisk root to the
target root, so that if Disaster Strikes, you can proceed by booting
from the target disk.  Sounds like the tar commands there are less
paranoid than they should be.  For historic reaons, this command is
MD, not MI.

On a pmax this gets done silently.  Perhaps fvdl or Simon Burge could
apply the same fix to the i386 port?


>6. We'll install from ftp.netbsd.org.  But this is taking forever.  Time to abort.
>   sysinst realizes that we are aborting and I eventually make it to the main menu.
>   I try an upgrade, but the fs's are still mounted so this fails. 

This is fixed in -current.  Both install and upgrade keep a list of
target mounts and undo them all when you go back to the main menu.


 Time to exit,
>   unmount, and sysinst again
>
>7. I decide to do a "local" upgrade.  It chooses sd0 as the Device which is a
>   real device in my machine, but everything so far has happened on wd0.  So
>   why do I really want to use it instead?

   Poor choice of menu titles.

   This menu entry is intened for mounting an unmounted device,
   and grabbing sets from there.  It would be used if someone has
   a machine with a non-NetBSD OS to which they've preloaded installation
   sets.

   It sounds like you're trying to use it to specify a path to
   an already-mounted directory containing sets.
   My sysinst tree has this, I hope it'll make it into the release
   tree by this weekend.


>
>8. I specify wd0e and directory /usr/INSTALL -- it seems to want to mount 
>   /usr/INSTALL on wd0e -- how silly.  

	Uh, no, it's what sysinst thinks you *asked* it to do.
	I've committed a much clearer description to the English
	message text; I hope Manuel can do the saem thing
	for msg.mi.fr.

	When the `install from already-mounted  dir' gets
	committed,  this  problem should go away.

>9. But putting the .tgz files in /usr/INSTALL doesn't work either.  It refuses
>   to find them.

	'Fraid so. See above.


>a. We exit, unmount, and restart again.  This time something goes wrong
>   "mv /mnt/etc /mnt/etc.old".  The script aborts and now when I type I
>   don't see the echo.  (stty echo fixes this.)

This is *two* bugs.

First, the `upgrade' procedure tries to save /etc in /etc.old.  the
first time, you get /etc.old. The second time, you get /etc.old/etc.
The third time (which you'd reached) it fails.

The obvious fix is to check for an existing /etc.old, and if found,
ask the user to clean it up (via the fork-a-shell) before proceeding.

The second is the tty echo, which I think Simon Burge has already fixed.


>b. Well, it's time to bail and do a simple "install" installation.  But before
>   I do that, I did one last "disklabel wd0".  The root is 277Meg, swap is 258Meg
>   and /usr is the rest 561meg.  Why is root so big?  Further, I have 128Meg of
>   primary memory not to swap.  I don't want to waste 258meg of disk.

I don't know, I'm told it was copied verbatim from the install sh
scripts.

Again, thanks for the feedback!