Subject: Re: wedges vs. not-quite-wedges, was > 1T filesystems, disklabels,
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/20/2002 10:10:10
On Thu, 19 Dec 2002, Jonathan Stone wrote:

> In message <Pine.NEB.4.33.0212191700480.9004-100000@vespasia.home-net.internetc
> onnect.net>Bill Studenmund writes
> >On Thu, 19 Dec 2002, Jonathan Stone wrote:
>
> >I'm sorry, I am NOT proposing a subset of an LVM.
>
> ... seems to me that's exactly what you are proposing.  The subset-ness
> is in whats managed (PVs, not LVs) rather than the functionality.
> Sorry if I was unclear.

Hmmm... I hadn't thought of it that way.

What I'm proposing is as much a PV as disklabel is. I'm not sure how much
that is, but I believe they are the same.

> >At boot, we itterate over all disks in the system. We read them for
> >different disklabel types, and find partitions. We add them to a struct
> >diskpart (a new disklabel-replacing structure). When we have found
> >everything (or found 255 user ones), we shove the diskpart into the
> >kernel.
>
> Sorry, but that doesn't answer the question.  Which parts of ``at
> boot'' are in the kernel, and which are done in usespace as we're
> going multi-user? (Yes, I care).

Undecided.

The general idea is that there are a number of partition type checkers,
and you itterate ofer a list of them. The list is a machine (or
kernel-config) depentant part followed by a NetBSD-global string.

The kernel knows its list, and has a number of checkers compiled in. The
default idea is a set of checkers comparable with what a port has now, but
we can wiggle on that idea. The kernel itterates over its list until
either it has exhausted the list, or it finds listed a checker that's not
compiled in. It installs a dispart.

The userland daemon/tool (depending on if its sitting there running or run
by an event daemon) notices a disk insert (or gathers a list of all disks
at boot), and looks at them. It loads the diskpart the kernel has, and
itterates over the list. The diskpart will have to include a list of
itterators called already.

The userland tool will itterate over the kernel list followed by the MI
NetBSD string (which contains ALL the partition types). Partition readers
look to see if they have already been called, and short-circuit out if
they have. Otherwise each partition reader looks to see if it finds its
kind of partition map, and adds partitions accordingly.

> >We then listen for disk-insert events. When a disk is inserted, we read it
> >for partitions as above. [...]
>
> Uh, *how*, exactly? What are the failure scenarios? What are the race
> conditions? What security holes do we expose ourselves to?

Don't know yet. We haven't written the code.

> And how do we get notified of disk events?
>
> Yes, we have kqueue. Do we have disk-insertion events?  I can't see
> them, Frank suggests maybe we need an event-routing daemon.  My source
> tree is a few days old, but the only kqueue support I found in a quick
> look is for changers; iirc ch(4) overlays read operations in ways I'd
> rather not, for disks.  I submit that it's not entirely clear how this
> will work.

What's wrong with adding a new kqueue event type? My understanding was
that kqueue was flexable enough to do this. After all, it was touted as a
way to notice network interface additions/removals, so it should (be able
to) do disks too.

> As I think I've said more than once now, I think the _hard_ problems
> lie elsewhere: in specifyin how the 256-diskparts-per-disk are carved
> up in a way that gives stable numbering, stable /dev entries, stable
> /etc/fstab entries, etc, when one OS on a multi-OS disk gets
> repartitioned.

I thought I'd explained what I have in mind.

Yes, this is a sticky point.

What I have in mind is a solution will work well unless you radically
re-partition the disk. It's a compromise, but to do anything else would
mean we need some sort of historical record, which I think is really
not-how-we-do-stuff.

Each partitioning scheme would have an idea of how many partitions it
might have. For disklabel-based ones, that's either 8 or 16. Probably 16.
So when such a scheme notices its partitioning scheme, it grabs a fixed
number of partitions (probably 16), then fills in the ones that are there.

Thus if you add a partition to a FreeBSD disklabel, there is already space
for it, so nothing else moves.

mbr disks would grab four partitions (modulo disklabel ones that were
found). And so on.

> If we can solve _that_, it should be a SMOP to configure different
> ``label' format-reader as kernel option, for those who (for whatever
> reason) just dont want to go the userspace route.

Take care,

Bill