NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: disk geometry (i386/amd64)



On 9/11/2018 6:28 AM, Robert Elz wrote:
     Date:        Tue, 11 Sep 2018 00:19:57 -0700
     From:        Don NetBSD <netbsd-embedded%gmx.com@localhost>
     Message-ID:  <3cedac34-90d8-78ff-b320-de2c5ac8c5d4%gmx.com@localhost>

   | [should I be "reply all" or just the list?  I guess a matter of personal
   | preferences?]

It usually makes little difference - it certainly makes no difference to me.
People who get annoyed by receiving 2 copies, usually ask to be
excluded.

Then I'll err on the side of just replying to the list.

    | > The raw partition allows this.
    | Again, as long as nothing else tinkers with the in-kernel copy of the disklabel
    | before I look at it.

No, regardless of that.

   | Please reread the initial exchange (reproduced below, for convenience):

I know what you mean, you're just missing the point.   From the raw
partition you can discover everything that you need.  The label is
irrelevant for your use, ignore it.   You can use the new ioctl if it
works to get info from the drive, if not, you just do it the hard way.

You're ignoring:  the fact that the firmware for our existing products already
get the device information directly from the device; the dweeb's suggested
Linux implementation would make all of this visible via hdparm or lshw; and
the politics involved in most design decisions.

   "Why all of this code when our products do it in one line of code?
   Is NetBSD *that* brain damaged??  Maybe we *should* have gone the
   Linux route... (?)"

[Silly to argue technical issues with managers that haven't written a
line of code in decades -- yet, strangely, feel AS IF they still
understand all the issues!]

   | Again, as long as nothing else tinkers with the in-kernel copy of the disklabel
   | before I look at it.

Just don't look at it at all.

The question was whether DIOCGDISKINFO looked at *something else* (as it
doesn't seem to be bothered by changes to the in-kernel label) and would
avoid the in-kernel disk label "risk" completely.

However, your "as long as nothing else tinkers" seems a peculiar worry
given the application you seem to have in mind.   What "something else"
could possibly be doing the tinkering in the environment you are describing?

If the approach "works", one can almost bet that it will be embelished to
perform other duties.  And, that the folks doing that embelishment are likely
not going to be "me" nor think that they should ask me if there are any
hidden gotchas to avoid.

E.g., the products contain code to "heal" corrupted disk images (to a certain
extent).  One easy way to *test* this code is to construct disk images that are
known to be corrupt.  Install those images on physical media (hey!  Maybe using
this very same appliance!!).  Then, install the media in "product" and let it
"fix" the image.  Verify it has done so, properly, by examining the NEW image
(hey!  Maybe using this very same appliance!)

Or, the shake-and-bake guys might want to use it to stress test the drive
components.  (Have they failed?  Have the number of remapped sectors increased?
Has seek time suffered?)

Or, ...

Relying on some set of cryptic rules like "don't alter the in-kernel disk label
lest something SILENTLY misbehave" is a recipe for disaster.  Especially when
the appliance is seen as the authority for the "disk imaging process" (i.e.,
this first application)

   | (which appears to be DIOCGDISKINFO, but not DIOCGDINFO)

Probably, yes.   If that (DIOCGDISKINFO) works, great, if not,
it is all still possible.

Yes -- discard the servers, grab existing product off the Line, modify
the firmware to use the information that *it* gathered from the attached
disk and then reproduce as many of these "test fixtures" as you need to
process N drives in parallel.

   | Ah, OK.  So, if I verify this for the sd(4) driver on a particular OS
   | version/port, then I need never concern myself that some particular *drive*
   | may fail to yield valid data?  I.e., if the ioctl fails, I can panic()?

You could, but that would not be my recommended action.   Certainly
drives have made their size available to the driver ever since we started
getting intelligent drives, I very much doubt you could find one still working
which did not support that - anywhere.   But is dealing with that case
so hard, compared with all else you are doing, that a panic is acceptable?
(Even given you just mean an application panic, as in "discard the drive
in slot 23" and not "crash the kernel")

My point was it should be a "can't happen" -- that really CAN'T happen.
E.g., when our products power up and bring the drive on-line, if it fails
to respond to all queries/actions as expected, we throw an error and the
product doesn't work.  There's no "user remedy" other than "return for
service".  No backup drive.  No user replaceable parts.  No mode of degraded
operation.

   | > SMART [...] so it is clearly possible.
   |
   | I think only via wd(4)?

Oh, you mean, not sd(4) - yes, possibly.   Sorry, I have no idea how
one would access that kind of data over scsi.

I will have to keep poking through the manual pages.  Unfortunately, there
are a fair number of "little details" that are missing so I spend a lot
of time wading through the sources for things that the documentation authors
"took for granted".



Home | Main Index | Thread Index | Old Index