The panic I fixed was a result of interaction of this 13-year-old code with the new nvme(4) code, as well as VirtualBox.
Whilst researching this pattern, I did notice that some other code in the tree does the same thing that Martin wants to do for this instance, too; however, it might appear that it's those instances that are possibly dead and untested code, and should potentially be converted to a similar logic as here (e.g., an explicit div-by-zero test, rather than converting division into multiplication).
My disagreement is mainly on the fact that these devices don't seem to exist, so, adding extra untested code to support these non-existing devices by doing either division or multiplication complicates the logic, and makes it more difficult to review the code.
On the other hand, the early return provided by my patch merely fixes the div-by-zero panic as encountered in VirtualBox under certain conditions, and doesn't result in any panic itself. I.e., I argue that it's mathematically provable that there won't be any possible regressions with the 2019-09-30 code.
* Here's the code from my 2019-09-30 patch that fixes a VirtualBox nvme panic, which is hereby open for review:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/subr_disk.c?only_with_tag=MAIN#rev1.129 - if (lp->d_secpercyl == 0) { + if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) { * Here's the context of the code: http://bxr.su/n/sys/kern/subr_disk.c#bounds_check_with_label 388 if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) { ... 434 bp->b_cylinder = (bp->b_blkno + p->p_offset) / 435 (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl;* Here's where the div-by-zero was originally introduced in this file nearly 13 years ago on 2006-11-25:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/subr_disk.c?only_with_tag=MAIN#rev1.83 + bp->b_cylinder = (bp->b_blkno + p->p_offset) / + (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl;* Here's some other file that does what Martin suggests we do here instead of what was done on 2019-09-30, it's also a common pattern for arch-specific code:
http://bxr.su/n/sys/kern/subr_disk_mbr.c#readdisklabel 480 blkno = lp->d_secperunit - lp->d_nsectors + i; 481 if (lp->d_secsize > DEV_BSIZE) 482 blkno *= lp->d_secsize / DEV_BSIZE; 483 else 484 blkno /= DEV_BSIZE / lp->d_secsize; Potential resolutions: * Leave all code as-is (all known panics were fixed).* Convert the subr_disk.c pattern to subr_disk_mbr.c multiply-or-div pattern (supporting imaginary devices with secsize below DEV_BSIZE).
* Convert the multiply-or-div pattern in subr_disk_mbr.c and some arch-specific files to remove support for these devices which probably don't exist.
For posterity, this discussion originally started at http://releng.netbsd.org/cgi-bin/req-8.cgi?show=1397.
I'd be happy to learn more on this topic here. Cheers, Constantine. On 2019-W44-1 19:24 +0100, Martin Husemann wrote:
Hi folks, Constantine and I disagree on a small change he made recently. The issue does not come up in any normal environment, and has apparently not happened at all in the last years (or so). The bug is pretty old. Actually it is a combination of several bugs that triggered it now: - when configuring VirtualBox wiht a NVMe controller, but no drive, strange things happen (not sure if that is a VirtualBox issue) - untill recently (this has been fixed) in that situation our kernel code fully attached a ld device to the non-functional NVMe. - when accessing a non-raw partition on such a bogus ld device (easy trigger: "dumpfs ld0a") a div by zero killed the kernel Constantine fixed that with a simple and effective change: --- subr_disk.c 22 May 2019 08:47:02 -0000 1.128 +++ subr_disk.c 30 Sep 2019 23:23:59 -0000 1.129 @@ -385,7 +385,7 @@ bounds_check_with_label(struct disk *dk, }/* Protect against division by zero. XXX: Should never happen?!?! */- if (lp->d_secpercyl == 0) { + if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) { bp->b_error = EINVAL; return -1; } This makes bounds_check_with_label() fail early for all devices with small sector size. Instead I would like to fix it differently. The div by zero happend when calculating the cylinder number for the buffer - which is not used a lot typically (or not at all). This looks like: Index: subr_disk.c =================================================================== RCS file: /cvsroot/src/sys/kern/subr_disk.c,v retrieving revision 1.119.2.1 diff -u -p -r1.119.2.1 subr_disk.c --- subr_disk.c 5 Apr 2019 08:40:19 -0000 1.119.2.1 +++ subr_disk.c 27 Oct 2019 10:20:28 -0000 @@ -364,7 +364,7 @@ bounds_check_with_label(struct disk *dk, { struct disklabel *lp = dk->dk_label; struct partition *p = lp->d_partitions + DISKPART(bp->b_dev); - uint64_t p_size, p_offset, labelsector; + uint64_t p_size, p_offset, labelsector, blocks; int64_t sz;if (bp->b_blkno < 0) {@@ -420,8 +420,13 @@ bounds_check_with_label(struct disk *dk, }/* calculate cylinder for disksort to order transfers with */+ if (lp->d_secsize < DEV_BSIZE) + blocks = 1; + else + blocks = lp->d_secsize / DEV_BSIZE; + bp->b_cylinder = (bp->b_blkno + p->p_offset) / - (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl; + blocks / lp->d_secpercyl; return 1; }The practical difference is likely zero, as such setup just do not happen in real life (and the other error(s) needed to get here being fixed). I prefer my version because it does not introduce artifical limits on the sector size - but it is not a very strong technical argument. What do you think? Martin