Re: Dealing with strange disk devices

To: Martin Husemann <martin%duskware.de@localhost>
Subject: Re: Dealing with strange disk devices
From: "Constantine A. Murenin" <cnst%NetBSD.org@localhost>
Date: Mon, 28 Oct 2019 12:48:41 -0700

To provide a bit more context: this code path with the potentialdivision by zero has been in the tree for over a decade; if thereexist any real devices with sector size below DEV_BSIZE, theywould have resulted in a div-by-zero since 2006-11-25.

The panic I fixed was a result of interaction of this 13-year-old codewith the new nvme(4) code, as well as VirtualBox.

Whilst researching this pattern, I did notice that some othercode in the tree does the same thing that Martin wants to dofor this instance, too; however, it might appear that it'sthose instances that are possibly dead and untested code, andshould potentially be converted to a similar logic as here (e.g., anexplicit div-by-zero test, rather than convertingdivision into multiplication).

My disagreement is mainly on the fact that these devices don'tseem to exist, so, adding extra untested code to support thesenon-existing devices by doing either division or multiplicationcomplicates the logic, and makes it more difficult to review the code.

On the other hand, the early return provided by my patch merely fixesthe div-by-zero panic as encountered in VirtualBox under certain conditions,and doesn't result in any panic itself.I.e., I argue that it's mathematically provable that therewon't be any possible regressions with the 2019-09-30 code.

* Here's the code from my 2019-09-30 patch that fixes a VirtualBox nvme panic,which is hereby open for review:


 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/subr_disk.c?only_with_tag=MAIN#rev1.129

-	if (lp->d_secpercyl == 0) {
+	if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) {


* Here's the context of the code:

 http://bxr.su/n/sys/kern/subr_disk.c#bounds_check_with_label

388    if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) {
...
434    bp->b_cylinder = (bp->b_blkno + p->p_offset) /
435        (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl;

* Here's where the div-by-zero was originally introducedin this file nearly 13 years ago on 2006-11-25:


 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/subr_disk.c?only_with_tag=MAIN#rev1.83

+	bp->b_cylinder = (bp->b_blkno + p->p_offset) /
+	    (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl;

* Here's some other file that does what Martin suggestswe do here instead of what was done on 2019-09-30,it's also a common pattern for arch-specific code:


 http://bxr.su/n/sys/kern/subr_disk_mbr.c#readdisklabel

480            blkno = lp->d_secperunit - lp->d_nsectors + i;
481            if (lp->d_secsize > DEV_BSIZE)
482                blkno *= lp->d_secsize / DEV_BSIZE;
483            else
484                blkno /= DEV_BSIZE / lp->d_secsize;


Potential resolutions:

* Leave all code as-is (all known panics were fixed).

* Convert the subr_disk.c pattern to subr_disk_mbr.c multiply-or-div pattern(supporting imaginary devices with secsize below DEV_BSIZE).

* Convert the multiply-or-div pattern in subr_disk_mbr.c and somearch-specific files to remove support for these deviceswhich probably don't exist.

For posterity, this discussion originally started athttp://releng.netbsd.org/cgi-bin/req-8.cgi?show=1397.


I'd be happy to learn more on this topic here.

Cheers,
Constantine.

On 2019-W44-1 19:24 +0100, Martin Husemann wrote:

Hi folks,

Constantine and I disagree on a small change he made recently. The issue
does not come up in any normal environment, and has apparently not happened
at all in the last years (or so). The bug is pretty old. Actually it is a
combination of several bugs that triggered it now:

 - when configuring VirtualBox wiht a NVMe controller, but no drive,
   strange things happen (not sure if that is a VirtualBox issue)
 - untill recently (this has been fixed) in that situation our kernel code
   fully attached a ld device to the non-functional NVMe.
 - when accessing a non-raw partition on such a bogus ld device (easy
   trigger: "dumpfs ld0a") a div by zero killed the kernel

Constantine fixed that with a simple and effective change:

--- subr_disk.c 22 May 2019 08:47:02 -0000      1.128
+++ subr_disk.c 30 Sep 2019 23:23:59 -0000      1.129
@@ -385,7 +385,7 @@ bounds_check_with_label(struct disk *dk,
        }

/* Protect against division by zero. XXX: Should never happen?!?! */

-       if (lp->d_secpercyl == 0) {
+       if ((lp->d_secsize / DEV_BSIZE) == 0 || lp->d_secpercyl == 0) {
                bp->b_error = EINVAL;
                return -1;
        }


This makes bounds_check_with_label() fail early for all devices with small
sector size.

Instead I would like to fix it differently. The div by zero happend when
calculating the cylinder number for the buffer - which is not used a lot
typically (or not at all). This looks like:

Index: subr_disk.c
===================================================================
RCS file: /cvsroot/src/sys/kern/subr_disk.c,v
retrieving revision 1.119.2.1
diff -u -p -r1.119.2.1 subr_disk.c
--- subr_disk.c	5 Apr 2019 08:40:19 -0000	1.119.2.1
+++ subr_disk.c	27 Oct 2019 10:20:28 -0000
@@ -364,7 +364,7 @@ bounds_check_with_label(struct disk *dk,
 {
 	struct disklabel *lp = dk->dk_label;
 	struct partition *p = lp->d_partitions + DISKPART(bp->b_dev);
-	uint64_t p_size, p_offset, labelsector;
+	uint64_t p_size, p_offset, labelsector, blocks;
 	int64_t sz;

if (bp->b_blkno < 0) {

@@ -420,8 +420,13 @@ bounds_check_with_label(struct disk *dk,
 	}

/* calculate cylinder for disksort to order transfers with */

+	if (lp->d_secsize < DEV_BSIZE)
+		blocks = 1;
+	else
+		blocks = lp->d_secsize / DEV_BSIZE;
+		
 	bp->b_cylinder = (bp->b_blkno + p->p_offset) /
-	    (lp->d_secsize / DEV_BSIZE) / lp->d_secpercyl;
+	    blocks / lp->d_secpercyl;
 	return 1;
 }

The practical difference is likely zero, as such setup just do not happen
in real life (and the other error(s) needed to get here being fixed).
I prefer my version because it does not introduce artifical limits on the
sector size - but it is not a very strong technical argument.

What do you think?

Martin

Follow-Ups:
- Re: Dealing with strange disk devices
  - From: Mouse

References:
- Dealing with strange disk devices
  - From: Martin Husemann

Prev by Date: Re: Dealing with strange disk devices
Next by Date: Re: Dealing with strange disk devices
Previous by Thread: Re: Dealing with strange disk devices
Next by Thread: Re: Dealing with strange disk devices
Indexes:

Home | Main Index | Thread Index | Old Index