tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Dealing with strange disk devices



To provide a bit more context: this code path with the potential division by zero has been in the tree for over a decade; if there exist any real devices with sector size below DEV_BSIZE, they would have resulted in a div-by-zero since 2006-11-25.

The panic I fixed was a result of interaction of this 13-year-old code with the new nvme(4) code, as well as VirtualBox.

Whilst researching this pattern, I did notice that some other code in the tree does the same thing that Martin wants to do for this instance, too; however, it might appear that it's those instances that are possibly dead and untested code, and should potentially be converted to a similar logic as here (e.g., an explicit div-by-zero test, rather than converting division into multiplication).

My disagreement is mainly on the fact that these devices don't seem to exist, so, adding extra untested code to support these non-existing devices by doing either division or multiplication complicates the logic, and makes it more difficult to review the code.

On the other hand, the early return provided by my patch merely fixes the div-by-zero panic as encountered in VirtualBox under certain conditions, and doesn't result in any panic itself. I.e., I argue that it's mathematically provable that there won't be any possible regressions with the 2019-09-30 code.


* Here's the code from my 2019-09-30 patch that fixes a VirtualBox nvme panic, which is hereby open for review:

 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/subr_disk.c?only_with_tag=MAIN#rev1.129

-	if (lp->d_secpercyl == 0) {
+	if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) {


* Here's the context of the code:

 http://bxr.su/n/sys/kern/subr_disk.c#bounds_check_with_label

388    if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) {
...
434    bp->b_cylinder = (bp->b_blkno + p->p_offset) /
435        (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl;


* Here's where the div-by-zero was originally introduced in this file nearly 13 years ago on 2006-11-25:

 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/subr_disk.c?only_with_tag=MAIN#rev1.83

+	bp->b_cylinder = (bp->b_blkno + p->p_offset) /
+	    (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl;


* Here's some other file that does what Martin suggests we do here instead of what was done on 2019-09-30, it's also a common pattern for arch-specific code:

 http://bxr.su/n/sys/kern/subr_disk_mbr.c#readdisklabel

480            blkno = lp->d_secperunit - lp->d_nsectors + i;
481            if (lp->d_secsize > DEV_BSIZE)
482                blkno *= lp->d_secsize / DEV_BSIZE;
483            else
484                blkno /= DEV_BSIZE / lp->d_secsize;


Potential resolutions:

* Leave all code as-is (all known panics were fixed).

* Convert the subr_disk.c pattern to subr_disk_mbr.c multiply-or-div pattern (supporting imaginary devices with secsize below DEV_BSIZE).

* Convert the multiply-or-div pattern in subr_disk_mbr.c and some arch-specific files to remove support for these devices which probably don't exist.

For posterity, this discussion originally started at http://releng.netbsd.org/cgi-bin/req-8.cgi?show=1397.

I'd be happy to learn more on this topic here.

Cheers,
Constantine.

On 2019-W44-1 19:24 +0100, Martin Husemann wrote:
Hi folks,

Constantine and I disagree on a small change he made recently. The issue
does not come up in any normal environment, and has apparently not happened
at all in the last years (or so). The bug is pretty old. Actually it is a
combination of several bugs that triggered it now:

 - when configuring VirtualBox wiht a NVMe controller, but no drive,
   strange things happen (not sure if that is a VirtualBox issue)
 - untill recently (this has been fixed) in that situation our kernel code
   fully attached a ld device to the non-functional NVMe.
 - when accessing a non-raw partition on such a bogus ld device (easy
   trigger: "dumpfs ld0a") a div by zero killed the kernel

Constantine fixed that with a simple and effective change:

--- subr_disk.c 22 May 2019 08:47:02 -0000      1.128
+++ subr_disk.c 30 Sep 2019 23:23:59 -0000      1.129
@@ -385,7 +385,7 @@ bounds_check_with_label(struct disk *dk,
        }
/* Protect against division by zero. XXX: Should never happen?!?! */
-       if (lp->d_secpercyl == 0) {
+       if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) {
                bp->b_error = EINVAL;
                return -1;
        }


This makes bounds_check_with_label() fail early for all devices with small
sector size.

Instead I would like to fix it differently. The div by zero happend when
calculating the cylinder number for the buffer - which is not used a lot
typically (or not at all). This looks like:

Index: subr_disk.c
===================================================================
RCS file: /cvsroot/src/sys/kern/subr_disk.c,v
retrieving revision 1.119.2.1
diff -u -p -r1.119.2.1 subr_disk.c
--- subr_disk.c	5 Apr 2019 08:40:19 -0000	1.119.2.1
+++ subr_disk.c	27 Oct 2019 10:20:28 -0000
@@ -364,7 +364,7 @@ bounds_check_with_label(struct disk *dk,
 {
 	struct disklabel *lp = dk->dk_label;
 	struct partition *p = lp->d_partitions + DISKPART(bp->b_dev);
-	uint64_t p_size, p_offset, labelsector;
+	uint64_t p_size, p_offset, labelsector, blocks;
 	int64_t sz;
if (bp->b_blkno < 0) {
@@ -420,8 +420,13 @@ bounds_check_with_label(struct disk *dk,
 	}
/* calculate cylinder for disksort to order transfers with */
+	if (lp->d_secsize < DEV_BSIZE)
+		blocks = 1;
+	else
+		blocks = lp->d_secsize / DEV_BSIZE;
+		
 	bp->b_cylinder = (bp->b_blkno + p->p_offset) /
-	    (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl;
+	    blocks / lp->d_secpercyl;
 	return 1;
 }
The practical difference is likely zero, as such setup just do not happen
in real life (and the other error(s) needed to get here being fixed).
I prefer my version because it does not introduce artifical limits on the
sector size - but it is not a very strong technical argument.

What do you think?

Martin



Home | Main Index | Thread Index | Old Index