Subject: Re: Another changer, another changer problem
To: NetBSD-current Discussion List <current-users@netbsd.org>
From: Bill Studenmund <skippy@macro.stanford.edu>
List: current-users
Date: 10/04/1998 12:38:46
On Sat, 3 Oct 1998, Greg A. Woods wrote:

> [ On Sat, October 3, 1998 at 12:42:28 (-0700), Bill Studenmund wrote: ]
> > Subject: Re: Another changer, another changer problem
> >
> > The problem is that to put the "raw" partition out of band, we need
> > another major number dedicated to the drive type. So sd's would have two
> > devices, and wd's would have two...
> 
> Another "major" number?  Huh?  Though I've not done the math yet I see
> no problem with dedicating a specific *minor* number to the raw disk and
> still having lots left over for 8 or more partitions per disk.

For disk drives, the minor numbers are split into unit and partition.
Right now, for each unit, we take one partition (for most ports, it's the
second one, 'c') and make it the raw disk.

Are you suggesting we assign one partition per unit to be the raw disk, or
for each physical unit, we assign two unit numbers, with one having the
partitions and the other the raw partition? We're doing the former, and
the latter sounds like a waste.

> > As it is, we'll have multiple major numbers for disks soon anyway. Why
> > double that?
> 
> We might need multiple major numbers for multiple scsi buses, but that's
> a whole different issue.  A separate major number for each wd controller
> might be a good idea too....

Why would we need seperate major numbers for each scsi bus? Each major
number supports thousands of units. No scsi bus will EVER get that big, so
we're wasting space. Even with a huge space for major numbers, this idea
sounds wasteful.

> All of this is why I suggested an alternate plan of using a consistent
> naming scheme such as:
> 
> 	{driver-name}{driver-unit}t{target-unit}l{LUN}
> 
> which would give names such as /dev/esp0t0l0 or /dev/ahc0t0l0 and so on
> for each "raw" disk (with either letters or "p{partition}" appended to
> identify partitions according to the current disklabel).

The whole problem with this entire naming thread is that the kernel
doesn't care what we name things, it only cares about device nodes. All
the ideas I've seen, both in this thread and outside of it, strike me as
being very wasteful of device node space for little gain.

I like this naming scheme. I don't like forcing it into device node space.
What I think would be cool is a device file system which looks at where
scsibus's are attached and what's on them, and expose those nodes with
names as you've described above. That way you can get your wired down
names, and we can keep the unit efficiency we get w/ sd*.

> > One of the big concerns with moving to 32-bit devices is that major
> > numbers made on a 32-bit aware system should work on a 32-bit unaware
> > system. The easiest solution was to make the unit/partition split vary w/
> > major number. Then you'd have 8-partition wd's, a dn 32- or 64-partition
> > wd's. That way, on an i386, major 0 would always be a wd w/ 8 partitions.
> > Out of band raw partitions double even that. 
> 
> Double HUH?

Ok. That explanation didn't work. So here's a question: how exactly do you
have an "out of band" device node? How can we have an "out of band" 
device node for the raw disk? Since device nodes for disks are processed
as major number / unit / partition, we have to be able to indicate we want
the "raw"  partition somehow. Right now, we make one of the partitions
special. You were wanting to do something different. So if blessing a
partition isn't ok, we have to do something using either the unit number
or the major number to indicate "raw" drive.  Both of those strike me as
wasteful.

I was responding to the implied idea of using an additional major number
to give the "out of band" space. If you had something else in mind, please
correct me.

> There's a wee window when you're booting a new kernel on an old
> filesystem when you'll need to run MAKEDEV or do something similar to
> get the new wide-numbered devices made, but this could have been handled
> in any number of ways, including use of a new revision of the filesystem
> structure itself (and then even fsck could resize the major/minor
> numbers and the kernel would read either 16-bit major/minor number or
> 32-bit or 64-bit ones depending on what the superblock says to do.

We're changing the topic here. First off, the problem is not major/minor,
but unit/partition. The kernel has been dealing w/ 32-bit devices for
quite a while (6 months?). The ffs has dealt with them for years (since
1.0 or so?).

The big problem is that we can't rely on running an upgrade script.
Charles went into this a lot, and it took me a while to see that there
really are just too many failure points. Device nodes with the current
major numbers really have to keep the unit/partition split they have now.

The solution is that, for instance for sd's, we will have two different
major numbers. One, the current one for each port, indicates the
traditional 8 partitions per unit. The other, which would start at 256 or
higher, would indicate 64 partitions per unit (well, maybe 32. That
number's not set in stone yet).

We work this magic with a simple change to the DISKUNIT and DISKPART
macro's and changes to the disklabel code to make sure that we don't look
at invalid partitions (especially since the NetBSD disklabel format only
supports up to 22 partitions - otherwise your partition table is multiple
sectors).

Take care,

Bill