Subject: Re: disktab(5)
To: der Mouse <port-sparc@NetBSD.org>
From: Don Yuniskis <auryn@gci-net.com>
List: port-sparc
Date: 11/27/2001 14:50:48
Greetings and Consternations!

>der mouse meowed:


>>>> So, you figure it out sometime because you *need* to know -- now
>>>> where do you *remember* that?
>>> In my experience, the drive remembers it for me and tells the kernel
>>> at boot time.
>> Yes, but you are counting on having the drives on a box that has
>> NetBSD (etc.) installed.  Or, schlepping the (external) drive over to
>> such a box.
>
>Yes.  This technique may not be suitable for everybody, but that's how
>I do it, that's my answer to your question. :-)


Understood.

>> I find it much easier to just compile the information in one place
>> where I am *likely* to go looking for it.  disktab(5) seems as good a
>> place as any!
>
>I suppose, though that helps only with giving you capacity and geometry
>info for a disk type - it doesn't keep inventory for you, which at
>least for me would be the larger task.  Or, to put it another way...
>...how do you tell "ST12400N with partial alpha install on it" from
>"ST12400N with Sun label and empty fs" in disktab?


I don't see why you can't put that in the name field of the
disktab or in a comment, etc.  Note that it's just a text
file so you can use it as you like...  :>

>>> (The value a drive reports for sect/track is usually an average
>>> obtained by dividing the total size by the cylinder and head
>>> counts.)
>> Yes.  And the average is usually not an integral value.
>
>True.  But believing it generally loses you only a comparatively small
>amount of space, in my experience.


Well, in the context of 40G drives, true.  :>

But note that the "fractional sector" is leveraged by the product
of the number of cylinders and the number of heads.  I.e.
capacity = H * S * C.  Given that 1,000 cylinders is pretty 
commonplace, for every head (H) in the geometry, each incremental
sector represents half a megabyte (assuming se#512).  So, 16 heads
means a sector is worth 8M.  Or, on average, a "fractional sector"
is worth 4M.  Again, puny on a big disk but it can be significant
on a 200M drive, for example (I am thinking about the issues I
had to address when tinkering with the tiny drives in the IPC's)

In the ATA world, the number of cylinders can be increased to
64K.  So, a drive that uses more than 16 heads would undoubtedly
use an internal translation to make the geometry appear as
something less than 16H, 256S by increasing the effective number of
cylinders.  I.e. a "fractional sector" could be much larger -- but
this would be in the context of a much larger *drive*!)

>> This is the essence of my query regarding how important geometry and
>> "sectors per unit" are to the OS!
>
>> If, for example, the OS only cares about sectors per unit, then
>> putting *anything* in the other fields should be acceptable, right!
>> I.e. 1C/1H/1S should work for *all* disks!
>
>Yes.  Except that's not quite the case - close, but not quite.
>
>With SCSI disks, I've always found it works to put nhead=32 sect/trk=64
>in the label, which gives 1M "cylinders"; it has occasionally been
>convenient to have all my disks partitioned up in chunks of the same

Makes sense.  I, instead, fabricate a fictitious geometry
(using factor(6)) that exactly coincides with the reported
capacity of the drive.  One of those factors is usually the
actual number of heads (for obvious raisins).  I feel
confident doing this because I "know" the drive has
smarts in it, etc.

I haven't been as optimistic with IDE drives...

>size.  This is what I do with all my NetBSD-only SCSI disks[%]; I've
>done it under SunOS and find that if I set the cylinder count the way I
>usually do (rounding up the partial "cylinder" at the end), the kernel
>whines at boot time but everything works.  NetBSD doesn't even whine.
>
>[%] I have one that as yet has to coexist with another OS, because
>    NetBSD/mac68k is not 100% self-hosting in the version I have; I
>    need a MacOS partition to bounce off of at boot time.


Yes.  I too am awaiting a boot loader that runs on that bare iron.
:-/

>IDE disks, as I said, I don't try to meddle with; there's too much
>magic I don't understand going on.


Aside from the limitations imposed by the Pea Sea and it's
BIOS (geometry issues, mainly), the biggest problem I see is
trying to coexist with CHS addressing *and* LBA addressing
schemes -- instead of just "requiring" everything to be LBA
(this, of course, makes legacy drives harder to use).

The LBA scheme assumes sectors are uniquely numbered from 0
to ns-1 within a track; then, moving on to the next *head*;
then finally moving on to the next *cylinder*.  So, LBA
is ((C * nh) + H) * ns) + S - 1 (assuming C & H are "0 based").

In the FreeBSD fiasco that I mentioned, if the driver assumes
one geometry and the *drive* does the translation (while operating
in CHS mode!) using *another*, you end up with the driver's
idea of a sector's address (LBA) being different from the
*drive's*!

If, for example, the driver is translating under a 10C/9H/5S
geometry, then a request for LBA 57 maps to 1C/2H/3S (if I
have done the math right in my head  :>).  Now, when the
1C/2H/3S is passed to the cylinder/head/sector registers
in the IDE interface, the *drive* maps this back into a
physical sector using a similar inverse operation.

But, if the drive is translating under a 9C/10H/5S geometery
(just to make the math easy), it will translate this to
physical sector (((1*10)+2)*5)+3-1 -> 62. (again, forgive my
arithmetic  :<)

Neither side of the interface sees any problems -- until the
driver starts looking to access block numbers near the end of
the disk!  For example, the last block on the disk is
#449 -- which occurs at 9C/8H/5S based on the driver's idea 
of the disk's geometry (check:  (((9*9)+8)*5)+5-1 = 449).

But, when the drive gets this request, it sees it as
requesting sector number #494!  (check: (((9*10)+8)*5)+5-1))
Obviously beyond the end of the medium.

Of course, a smart controller in the drive should note that
C >= nc and not try to carry out the operation.  (my drives
weren't that smart -- they would just drive the head actuator
into the stops!)

A simple test in the drive would just check the resulting
translated sector number to see if <= su.  Though that still
wouldn't catch erros where H >= nh or S >= ns, etc.

I think the problem stems from the fact that you can probe
an IDE's *native* geometry as well as it's *current*
geometry (IDE drives use INITIALIZE for the controller to
*tell* the drive what it's geometry is).  So, if the
BIOS tells the drive "use this geometry", the drive can 
be happy with it.  If, a few milliseconds later, the *BSD
boot process looks at the *native* geometry of the drive
and uses *that* to feed the LBA translation buried in
the driver, then the aforementioned problem *silently* (!)
occurs!

>>>> Then how does the driver know:
>>>> - where partitions begin and end
>>> The label, which it reads off the disk (usually sector zero) when
>>> the drive is first accessed.
>> Yes, but the label also contains geometry information.  Are you
>> claiming that this is *ignored*?  I.e. could I scribble "0"s in all
>> the fields and not notice *any* side-effects?
>
>No, I'm not claiming that.  As I explained, FFS filesystems have
>geometry information in the superblock; by default, newfs copies this
>out of the label at filesystem create time.  (newfs has options to
>override the label's geometry info; if you use enough of them, the
>label's geometry _is_ ignored.)


OK.  So, even when "ignoring" the geometry, it is effectively
using command line options to imply a *new* geometry...

>On some ports, the label geometry information may be used to compute
>partitioning info; Sun-compatible disklabels, for example, keep
>partition beginning points in units of cylinders, not sectors.  (Some,
>but not all, NetBSD versions also keep a BSD disklabel around, so that
>only your boot partition is really affected by this.)


So, there is *some* incentive to getting the geometry *right*...

>>>> - where to "avoid"
>>> Huh?  I don't know what you're referring to here.  Perhaps I'm just
>>> misinterpreting your smiley....
>> I was referring to the partition table.
>
>See bounds_check_with_label(), which is designed to keep you from
>accidentally overwriting the label.
>
>> And, also, the "magic" that causes the drive to avoid "sector 0"
>> regardless (???) of which partition it appears in.
>> I.e. placing a BSD partition at "0" *or* SWAP at "0"!  (or, is this a
>> mistake waiting to happen?)
>
>FFS does not use the first 8K of a filesystem, apparently specifically
>so that you can put an FFS filesystem at offset zero and not thereby
>wreck your label/bootblock area.


Yes.

>Putting swap at offset zero, yes, is asking for trouble.  I think


Argh!  This is how I have been building disks.  :-(  Time to drag
out the DLT and "reload" them...

>bounds_check_with_label will catch this, unless you swap on RAW_PART,
>in which case you presumably don't care about having a label.
>
>>> At the filesystem<->driver interface, the drive is just a big linear
>>> array of blocks.
>
>>> Below that, it depends on the interface.  SCSI disks are addressed
>>> by sector number.  IDE disks may or may not be; I think this is LBA
>>> addressing versus CHS addressing.
>> But, if CHS addressing is used, then the geometry is significant.
>
>True.  But I don't know which geometry it is that matters; I'm inclined
>to doubt it's the disklabel geometry, but I haven't dug around to see.


I am assuming that the system takes the geometry out of the
disk label and not (?) from the controller?  Though this
is just supposition.  I would *hope* that it did so since this
is the more flexible approach.  It, for example, would allow
the system to read the label (at a known place on the disk!)
and reINITIALIZE the IDE controller to ensure that the
controller and disk geometries coincided.  This is exactly what
the FreeBSD problem failed to do -- apparently claiming that
not all legacy IDE drives allow you to read "current" and
"native" geometry...

Of course, a better solution might be to just not support
i386??  :>

--don