Re: FFS fragments (was: RAID stripe size)

To: Edgar Fuß <ef%math.uni-bonn.de@localhost>
Subject: Re: FFS fragments (was: RAID stripe size)
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Thu, 01 May 2025 07:43:38 +0700

Date: Wed, 30 Apr 2025 09:42:31 +0200
From: Edgar Fu�<ef%math.uni-bonn.de@localhost>
Message-ID: <aBHUZ8QRvdw8tcZu%trav.math.uni-bonn.de@localhost>

| > Note that the numbers in parentheses [of the fsck output] are what is free
| Oooooops. I never understood it that way.

Take Greg's filesystem:

603427 files, 48107560 used, 11047434 free (184530 frags, 1357863 blocks, 0.3% fragmentation)

There are 11047434 free fragments, which are composed of 184530 frags
which come from blocks where at least 1 frag is allocated, and 1357863
full sized blocks not used at all. 8 * 1357863 + 184530 == 11047434

That that calculation works says there are 8 frags/block (which one
would normally work out by doing (free - frags) / blocks ... except
that using dumpfs is a much easier method to discover that value, and
dumpfs also tells you what the block and frag sizes actually are, not
just the number of frags/block which is all you can deduce from the
above line alone.

The 0.3% is the percentage of all filesystem blocks that are currently
part allocated, and so not available to be used as a full sized data
block ... 184530 / (1357863 * 8 + 48107560) * 100 (== 0.31)
If that value starts getting too high, the filesystem's block size is
probably too big for the data stored in it.

| > Perhaps surprisingly, the filesystem doesn't really bother keeping track
| > of how much of anything is allocated
| Yes, but fsck could.

It could, when it does a full scan, but that's not its job.

fsck's purpose is to validate (and fix when possible, and required)
the file system structure.

A little extra work in dumpfs (which would make it slower, perhaps a lot,
no idea, haven't tried it) might allow it to provide that info though,
that would be a better place to put code for something like that.

It is actually almost there already, with some care you can work out
fragments for each file from dumpfs -i output (its "blocks" value is
512 byte blocks, DEV_BSIZE, fragments are at least 2 of those, as the
smallest frags allowed are 1K - but how many depends upon the filesys
parameters).

Given the block size, frag size, file size, and blocks allocated, the
number of frags assigned to each file can be calculated (I think).
Only consider files smaller than 12 blocks (filesys bsize blocks)
and then ignore all multiples of that block size (those are full
blocks) in the allocated block count, the remainder is probably how
many actual frags that file contains (keep track of the units being
used when doing this, 512 byte blocks, frag sized blocks, fs_bsize
sized blocks...).

| > If read speed is more important than write speed, then bigger stripes
| > make more sense. [[I corrected my typo...]]
| Why is that so?

Just because in general bigger reads equate to faster overall read
speed (less rotational delays waiting for the first block of a sequence
to appear). Of course this only applies to rotating media, SSDs are
entirely different, but even there, there's less overhead to do one large
read than several smaller ones. The hope is that when raidframe reads a
stripe, it might contain data for 2, or more, blocks, so the next time a
read is done by the application, no actual i/o is needed. (Of course,
this can make writing slower, as a block write needs to change just part
of a stripe).

But as Mouse said, all this depends upon all kinds of factors, and the
only real way to know is to run (perhaps slightly cut down) versions of
your real workload, and measure yourself, using the hardware you want to
optimise and the data that you want to read (or write) quickly.

Don't use benchmark applications - all they can ever achieve is for
the system to be optimised so the benchmark runs quickly, which is
almost never very much related to any real workload. Only what you
will actually be using the system for is meaningful.

kre

References:
- FFS fragments (was: RAID stripe size)
  - From: Edgar Fuß
- Re: RAID stripe size
  - From: Greg Troxel
- determining the number of FFS fragments in use
  - From: Edgar Fuß
- RAID stripe size
  - From: Edgar Fuß
- Re: RAID stripe size
  - From: Robert Elz

Prev by Date: Re: biowait for seconds
Next by Date: Re: RAID stripe size
Previous by Thread: Re: FFS fragments (was: RAID stripe size)
Next by Thread: sys_semop() retries redo the semop copyin() - why?
Indexes:

Home | Main Index | Thread Index | Old Index