Eduardo Horvath wrote: > On Sun, 1 Feb 2009, Robert Elz wrote: > > > As I have said before, I don't really care which way the default > > for ddb.onpanic is set, but ... > > > > | 1) Not everyone runs X. Most servers use a serial console. > > > > Forget X when discussing this issue, X isn't an argument for anything, > > one way or the other. By the time X gets anywhere near the system, > > sysctl.conf has run, and the local system owner can trivially decide > > which behaviour works for them and insert the relevant line into > > sysctl.conf. > > If you check the original email, X was the justification for making this > change. It's a bogus justification, but I don't think we can ignore it. > > > | 3) There is a period of time between loading the kernel and when the rc > > | scripts run where you can't tweak the ddb_onpanic value. > > > > Yes, this is why the kernel needs a default value, and why we can > > sensibly discuss what that default value should be. By itself it > > doesn't say anything about which particular value should be the > > default however. > > The only time the default setting of this value is important is between > the time the power is turned on and the time the rc script is run that can > change the sysctl value. As soon as that happens it is no longer defalt > behavior but whatever the sysadmin managing the system desires. > > > | 4) If the machine panics early, say during device configuration due to > > | broken hardware, you don't really want it to attempt to reboot, since > > that > > | will result in an infinite reboot loop. > > > > Yes, perhaps - it depends upon the cause of the panic, but this > > can certainly happen. But I'm not sure this is any worse (or > > better) than the infinite loop the kernel is sitting in waiting > > for a reply to the db> prompt. Both require user interaction, > > and nothing proceeds until a user has done something to alter the > > state of the system. > > Well, no. If the system drops to the db> prompt, then it requires user > intervention. Presumably, all the information about the cause of the > panic is also sitting there on the screen and has not scrolled off so the > admin can make an intelligent decision about what the corrective action > should be. > > If, on the other hand the system is left to attempt to dump core and then > try an automatic reboot you have a lot of potentially distructive > operations that could happen. > > Each time the system tries to reboot there will be a set of resets and > possibly power-cycles. Excessive resets or powercycling can potentially > damage integrated circuits through thermal cycling or disks though > spin-up/spin-down cycles. > > > As an alternative, if the system panics due to a corrupted filesystem > > that was incorrectly marked clean, then ddb is of no practical use > > and a reboot will detect the unclean filesys and fsck (and either > > fix, or at least tell the user what the problem is). > > We are talking very early in the boot process. I have never seen the case > where a filesystem is so corrupt that fsck is able to clean it but the > kernel still takes a panic after fsck runs. It used to be that if fsck > fixed certain problems in the root filesystem the rc scripts woult > automatically reboot the system. I assume that's still the case and a > reboot won't stop at the db> prompt. > > OTOH, if you keep running fsck only part way on the filesystem, you may > end up doing irreparable damage to it. > > And if the system manages to mount the filesystem and run savecore each > time before it gets to the panic, you end up filling up the root > filesystem with a series of useless coredumps. Please, this is only a very near corner case. If you boot a new kernel you are on-site or having ILO access anyway, rebooting a system without that always have been a hazard play. > > Finally, if the system is suck in a panic loop, how do you diagnose the > problem? The system boots, prints a panic message, and then it resets > itself and starts printing the firmware messages which cause the panic > message to scroll off the screen. I suppose if you're lucky and you can > convince the machine to get into single-user mode, you can manually set > ddb_on_panic=1 and then switch to multi-user mode to continue diagnosis. > But if you can't get to the single-user shell you are SOL and probably > won't be able to figure out what's causing the problem let alone how to > fix it. This is why we have 'boot -c'. -- When in doubt, use brute force. Adam Hoka <ahoka%NetBSD.org@localhost> Adam Hoka <ahoka%MirBSD.de@localhost> Adam Hoka <adam.hoka%gmail.com@localhost>
Attachment:
pgpP9O2eMBh8q.pgp
Description: PGP signature