Subject: disklabel bug?
To: None <port-i386@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: port-i386
Date: 01/21/2003 21:30:33
I'm putting some replacement disks into a machine and just got a
repeatable kernel divide-by-zero trap on boot.

On investigation, it turns out that in readdisklabel (i386/disksubr.c),
lp->d_secpercyl was zero.  This appears to be due to the way
wdgetdisklabel calls readdisklabel twice, with the second call blindly
assuming the first call's values are suitable.  Looking at recent
-current (from just before the cvs merge - I haven't updated my cvs
fetch for the merge) makes me think the bug was probably still present
then.

This can't be right.  Repeatable divide-by-zero crashes just because a
disk happens to have the wrong garbage on it are...undesirable.

Am I hallucinating, or is there really a bug here?  Worth a send-pr?

My first reaction was to call wdgetdefaultlabel again inside the if,
but that defeats the whole point of calling readdisklabel twice.  I
then tried testing d_secpercyl in readdisklabel and bashing it if it's
zero, but then it crashes later, while handling the completion
interrupt for the transfer.

I'm going to fiddle with it and see if I can craft a small vnd image
that provokes the failure, then see if I can get someone to try it
under -current.  (Anyone willing to volunteer as guinea pig? :)

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B