Subject: i386 cd-, pxe-, and dos-boot anomalies
To: None <port-i386@NetBSD.org>
From: Chapman Flack <nblists@anastigmatix.net>
List: port-i386
Date: 05/24/2006 14:04:09
A friend was given an Ubuntu Live CD to try out on a hoary old box
now running Win98 (100 MHz Pentium, PCI/ISA, 80MB RAM, 2 IDE).

Ubuntu didn't make a great first impression, as the first boot access
to the CDROM immediately flickers the screen and cycles back to POST.

Smiling serenely, I said "Of Course It Runs NetBSD, so let's just boot
off my handy CD here and take a look around."

That gave a different result. Instead of cycling back to POST, the first
access to my 3.0 ISO pops up an AMIBIOS configuration screen, ignores
the keyboard, and requires a hard reset.

PLAN A WAS CD BOOT.  OK, NOW FOR PLAN B.

Still smiling serenely, I pulled out a NetBSD laptop, enabled dhcpd
and tftpd, reset the hoary beast and sat back to watch the PXE boot.

Hoary beast's console says "DHCP..." - my console says
DHCPDISCOVER - DHCPOFFER - DHCPREQUEST - DHCPACK. Hoary says "TFTP..." -
my box says "read request for pxeboot_ia32.bin: success"

Hoary sits there.

After a timeout, I see another "read request for pxeboot_ia32.bin:
success" and hoary still sits there. After a couple more "successful"
but ineffective requests for pxeboot_ia32.bin, hoary gives up and
boots Windows.

tcpdump shows the following exchange going on (condensed for clarity):

hoary > server: tftp rrq pxeboot_ia32.bin octet blksize 1456
s > h: tftp oack blksize 1456
(timeout)
s > h: tftp oack blksize 1456
(timeout)
h > s: tftp rrq pxeboot_ia32.bin octet blksize 1456
s > h: tftp oack blksize 1456
(timeout)
s > h: tftp oack blksize 1456

and so on. Protocol requires hoary to ack my oack to confirm the
negotiated blksize, and then I would begin sending data blocks.
Hoary would have to send an 8 error if not satisfied with the
negotiated value, but that's unlikely as it's exactly the value
requested. Instead, hoary seems to be ignoring the oacks completely!

It didn't take long to find a report of exactly the same behavior
with the same PXE firmware (Intel LANDesk Service Assistant 0.99b):
  http://syslinux.zytor.com/archives/2002-February/000104.html

The thread included a solution!  At least for a linux server,
disabling MTU discovery reportedly got the reply packets to be
recognized by the PXE client.

So my serene smile returned, I reset everything, typed
'sysctl -w net.inet.ip.mtudisc=0' on my laptop, restarted the hoary
beast, and sat back to watch the successful PXE boot.

Only, nothing changed. Apparently MTU discovery isn't the answer
to this problem in all cases.  :(

PLAN B WAS PXE BOOT. OK, NOW FOR PLAN C.

... but first ...

Has anyone experienced either of these issues or know of workarounds?
The NIC is a Compaq NC3121 (fxp), its option ROM announcment is "PXE-M04
hooking bootstrap int 19h" and the Intel LSA version again is 0.99b.
If a workaround is known, I can add it to the pxeboot docs.

As for the CD booting problem, I saw reports that some old BIOSes
care whether the drive is slaved on the primary or secondary IDE,
but I tried both with no effect; I also tried replacing the fairly
new Toshiba DVD burner with the older Toshiba CDROM that was original
in the machine, but that made no difference either. The hoary beast
is running an AMI BIOS AP53 R2.30 Jun.04.1996.

OK, NOW for plan C.

I booted Windows and FTP'd down a gzipped kernel and dosboot.com
from 200605220000Z.  That almost worked. dosboot seems to inflate
the gzipped kernel OK (the progress numbers look reasonable), but then
it prints the warning "read symbols" and hangs. Maybe the lseek()
replacement in libsa is broken somehow?  After manually gunzipping
the kernel, I at last had a successful boot.

There, wasn't that easy?  Of course it runs NetBSD!  :D

-Chap