Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Heads up! amd64 secondary bootstrap broken?



On Sat, 11 Aug 2012, John Nemeth wrote:

On Nov 26, 10:09pm, Paul Goyette wrote:
}
} I just started an update of one of my machines, and things are now quite
} broken!
}
} I updated my bootblocks as well as installing a new 6.99.10 kernel.  I
} used the following commands:
}
}       # cp $DESTDIR/amd64/usr/mdec/boot /boot
}       # installboot -v /dev/wd0a $DESTDIR/amd64/usr/mdec/bootxx_ffsv1
}
} The machine is now dead.  :(
}
} It boots the primary bootstrap and presents the menu.  It allows me to
} boot memtestplus (from an old install!).  However, when trying to boot
} either the old or new NetBSD kernel, it gets as far as trying to mount
} the root file system, and stops with
}
}       boot device wd0
}       root on wd0a dumps on wd0b
}       cannot mount root, error = 79
}       root device (default wd0a):

    This has very little to do with the boot code.  The primary
purpose of the boot code is to load the kernel and execute it.  The
first message from the kernel is the copyright.  At this point, the
kernel has completed autoconf (i.e. it has been running for a while).
Did you look up the error code?

P4-3679GHz: {257} grep 79 /usr/include/sys/errno.h
#define EFTYPE          79              /* Inappropriate file type or format */

Did your dmesg show the correct device for wd0?  What does it show for
drives?  Does your kernel have the correct filesystem for your root
filesystem?  Note: one relatively recent change to /boot is that it no
longer attempts to load ffs.kmod.

} Of course, I'm using a USB keyboard which isn't working at this point,
} so I can't try much of anything else.
}
} I'm going to pull the drive out of the machine and use another box to
} reinstall a working bootxx_ffsv1, but I want to give folks a heads-up
} that there might be some breakage here.

    bootxx_ffsv1 is there strictly to load /boot.  It obviously did
that, so it is working fine.  So far, you haven't provided any evidence
of breakage in the boot code.

Well, duh, yeah, you're right, of course - the boot code by this time has finished all of its work.

HOWEVER,

I did as I had threatened - removed the hard drive and installed it in one of my other machines which still had original boot code. I installed the earlier /boot and mdec/bootxx_ffsv1 onto that disk, and then returned it to the original machine.

That machine now boots just fine all the way to multi-user.

So, while the boot code _should_ have long done its job and been gone, restoring the original boot code does fix the problem.

Perhaps there's something wrong with some of the parameters that get passed into the kernel from the boot code?


-------------------------------------------------------------------------
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |                          | pgoyette at netbsd.org  |
-------------------------------------------------------------------------


Home | Main Index | Thread Index | Old Index