tech-kern: Re: Logical Volume Managers

Subject: Re: Logical Volume Managers
To: Bill Studenmund <wrstuden@zembu.com>
From: Christian Limpach <chris@Nice.CH>
List: tech-kern
Date: 06/28/2000 15:25:17
> You've kinda muddled questions here. It's one question to have a softc per
> vg, one overal, and one per lv. It's another quesiton at to what is in
> each softc. If you need per-vg, per-pv, and per-lv storage, then you need
> three softc's. :-) Well, three sets of private storage.

I need per-vg and per-lv storage but accessing a lv needs also access to the
per-vg storage of the vg this lv is in unless you want to duplicate this
information in each per-lv storage.  I tend to prefer to use one softc for
the whole lvm system, except that I'm somewhat unclear on these points:
- is the number of items in one pool limited or is there a performance
advantage to use different pools for each vg or each lv?
- since there is space for a disklabel in the struct disk and this space is
used for on-the-fly generation of disklabels for DIOCGPART, can this break?
- are there advantages to allocating memory at boot time versus allocating
memory as needed?

> What exactly is a buffer pool? I think one struct disk per vg is no big
> deal.

My strategy routine gets a single struct buf to process.  If this request
spans several physical extents or striping is used, I need to make several
strategy calls to the drivers which are used for the actual storage of the
data.  I use buffers which I manage via pool_init/pool_get/pool_put.  This
is my understanding of the code ccd uses to do a similar thing.

> Why use disklabels?

I think I need to use disklabels since that's the structure which is used to
present a partition (=logical volume) to the rest of the system.  My driver
gets some DIOCGPART ioctls now and then and I don't know what would happen
if I didn't process them correctly.

> Disklabels are one way to divide up a disk, and LVM's are another. Each lv
> is used in ways similar to a partition, so it doesn't make sense to me for
> them to be subdivided.

I think the problem is that disklabels are used for two things:
- dividing up the space on a disk into several areas
- representing partitions to the rest of the system
I think they are suited quite well for the first use but not too well for
the second use (see sbin/newfs.c and how it tries to find out the size of
the partition it is going to initialize...).  There's also the limit on the
number of partitions without which I would have used one disklabel to hold
all the logical volumes (=partitions) of one volume group.  Instead I use
one disklabel for each logical volume which is generated on the fly.

> How are you divvying up device numbers? I'd have thought an LVM would just
> grab one major device number for lv's, and each lv, as found, would get a
> particular minor number. You wouldn't need to be doing the unit/partition
> division trick.
> What does the LVM do here?

It works how you think it would work.  There is no unit/partition division
and each lv gets a minor as it is found.  This works well except for the
programs which want to read the disklabel, like newfs and maybe mount, but
definitely also the kernel...

> ccd's are pseudo-devices, and pseudo-devices are staticaly allocated at
> boot. There's nothing which will come along and say, "oh, I found a ppp on
> this bus," so they are treated a little different in the configuration
> schemework.

yes and no, wouldn't it be nicer to be able to use as many ccd's as you
want?  I mean ccd's can be configured at any time, not only at boot and I
don't see any reason in the ccd code why it wouldn't be possible to only
allocate the memory needed when the ccd is configured.  It's "oh, there is a
ccd here btw" which the user can trigger at any time.

> I'm not sure, but I think it'd be fine to make vg's pseudo-devices (you'll
> probably have a good handle on the number of vg's you'll have around at
> config time). Also, I think you can use the config framework to find lv's
> under vg's (if not, you can probably talk Jason into doing it).

hmm, I would advocate against this since it will make the code in the kernel
a lot bigger.  As it is now, the part in the kernel only uses the data
structures it gets passed from the userspace programs.  The kernel won't
read the configuration from disks and parse it.  I think it's better to do
all this in userspace.

      christian