Re: Beating a dead horse

Subject: Re: Beating a dead horse
From: "William A. Mahaffey III" <wam%hiwaay.net@localhost>
Date: Tue, 24 Nov 2015 21:57:50 -0553.75

On 11/24/15 19:08, Robert Elz wrote:

     Date:        Mon, 23 Nov 2015 11:18:48 -0553.75
     From:        "William A. Mahaffey III" <wam%hiwaay.net@localhost>
     Message-ID:  <5653492E.1090102%hiwaay.net@localhost>

Much of what you wanted to know has been answered already I think, but
not everything, so....

(in a different order than they were in your message)

   | Also, why did my fdisk
   | choose those values when his chose apparently better ones ?

There's a size threshold - drives smaller get the offfset 63 stuff,
and drives larger, 2048 ... the assumption is that on small drives
you don't want to waste too much space, but on big ones a couple of
thousand sectors is really irrelevant...

I suspect the threshold is somewhere between your 1TB and the other 2TB
drives.

   | If so, is there any way to redo that w/o a complete reinstall :-) ?

Since you're using raid, there is, but it would take forever, so you may
not want to do it...   The upside is that the correction would just
slow down your system slightly, leaving it operational the whole time.

I suspect you're not going to want to bother, so I won't give all the
steps, but the basic strategy would be to use your spare disc - stop that
being a spare for now, and repartition it the way that you want all of the
drives partitioned.  Then add that back as a hot spare for the raid array.
Then use raidctl to "fail" one of the other drives - raidframe will then
reconstruct the "failed" drive onto the hot spare (the one that is now
correctly divided up).   Once that process finishes, the one that was failed
is no longer in use, and can be repartitioned.   Do that, then add it as a
hot spare, and then fail another of the drives.   Repeat until done...

Expect the whole process to take a week (not continuously, but you're
only likely to do one drive a day, once the reconstruct starts you'll
just leave it to work, and go do other stuff - on that system or whatever).
Actual human time would be about 10 minutes per drive.

If it was me, I wouldn't even think to look see if it was finished till
the next day...

Doing a complete reinstall of everything would probably be done in a
few hours, so ...

Note that none of this is relevant unless you really decide that it
is needed, and you work out first exactly what all the numbers should
be.   Also note that (ab)using raidframe this way wiil only fix the
alignment of the raid arrays, if any of the raidframe params ought to be
altered, or the filesystem(s) built on the raid array, then that method
won't help at all, and starting again is definitely the best option (and
doing that before you get too invested in data on all those TBs)

   | The machine works well except for horribly slow I/O to the RAID5 I setup

What is your definition of "horribly slow" and are we talking read or write or
both ?


4256EE1 # time dd if=/dev/zero of=/home/testfile bs=16k count=32768
32768+0 records in
32768+0 records out
536870912 bytes transferred in 22.475 secs (23887471 bytes/sec)
       23.28 real         0.10 user         2.38 sys
4256EE1 #

i.e. about 24 MB/s. When I zero-out parts of these drive to reinitializethem, I see ~120 MB/s for one drive. RAID5 stripes I/O onto the datadrives, so I expect ~4X I/O speed w/ 4 data drives. With variousoverheads/inefficiencies, I (think I) expect 350-400 MB/s writes. Iposted a variation of this question a while back, w/ larger amount ofI/O, & someone else replied that they tried the same command & saw ~20Xfaster I/O than mine reported.


Raid5 is not intended to be fast, it never will be - for writes, it should
be reasonable for read.

What really matters is not some random benchmark result (my filesystem is
faster than yours...) but whether it actually gets your workload done
well enough or not - I have used raid5 for home filesystems, and pkgsrc
distfiles, and other stuff like that (mostly read, occasionally write)
and have never even wondered about its speed - that's all simply irrelvant.
I use raid1 for filesystems with lots of writes (/usr/obj, ...) where
I want write speed to be good.

   | Partitions aligned to 2048 sector boundaries, offset 2048 <------ *DING
   | DING DING* !!!!

Note that that "2048" is an internal fdisk default, for how it will
help you align stuff.  Of itself, it doesn't mean anything to partitions
that have been made.   And:


   | When I used fdisk to check my drives (well, 1 of them, all are
   | identically fdisk-ed & sliced), I see the following:
   |

   | Partition table:
   | 0: NetBSD (sysid 169)
   |      start 2048, size 1953523120 (953869 MB, Cyls 0/32/33-121601/80/63),

What really counts there is the "start 2048".   That's what you want.
2048 is a nice multiple of 8 ...   1953523120 is also a multiple of 8,
so everything there should be nicely setup for 4K blocks.  What the
default alignment were to be if you were to change things, is irrelevant.

That is, there is absolutely no need to repartition the drives, the current
layout is fine, and is not the problem (if there even really is a problem).

So, now if your I/O really is slower than it should be, and slower than
raidframe's raid5 can reasonably be expected to achieve, I think the issue
must be with either the raidframe or ffs parameters.

Those you haven't given (here anyway, I don't remember from when this
discussion was going on earlier.)

What is your raidframe layout, and what are your ffs parameters?

kre



ffs data from dumpfs for that FS (RAID5 mounted as /home):

4256EE1 # cat dumpfs.OUTPUT.head.txt
file system: /dev/rdk0
format  FFSv2
endian  little-endian
location 65536  (-b 128)
magic   19540119        time    Tue Nov 24 21:43:06 2015
superblock location     65536   id      [ 5593845d ee5eb3c ]
cylgrp  dynamic inodes  FFSv2   sblock  FFSv2   fslevel 5
nbfree  74726702        ndir    139822  nifree  228674724 nffree  9720
ncg     4964    size    943207067       blocks  928593007
bsize   32768   shift   15      mask    0xffff8000
fsize   4096    shift   12      mask    0xfffff000
frag    8       shift   3       fsbtodb 3
bpg     23753   fpg     190024  ipg     46848
minfree 5%      optim   time    maxcontig 2     maxbpg  4096
symlinklen 120  contigsumsize 2
maxfilesize 0x000800800805ffff
nindir  4096    inopb   128
avgfilesize 16384       avgfpdir 64
sblkno  24      cblkno  32      iblkno  40      dblkno  2968
sbsize  4096    cgsize  32768
csaddr  2968    cssize  81920
cgrotor 0       fmod    0       ronly   0       clean   0x02
wapbl version 0x1       location 2      flags 0x0
wapbl loc0 3773140352   loc1 131072     loc2 512        loc3 3
flags   wapbl
fsmnt   /home
volname         swuid   0
cs[].cs_(nbfree,ndir,nifree,nffree):
4256EE1 # raidctl -s dk0
raidctl: ioctl (RAIDFRAME_GET_INFO) failed: Inappropriate ioctl for device
4256EE1 # raidctl -s raid0a
Components:
           /dev/wd0a: optimal
           /dev/wd1a: optimal
No spares.
Component label for /dev/wd0a:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 10, Mod Counter: 123
   Clean: No, Status: 0
   sectPerSU: 32, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 33554368
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0
Component label for /dev/wd1a:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 10, Mod Counter: 123
   Clean: No, Status: 0
   sectPerSU: 32, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 33554368
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
4256EE1 # df -h
Filesystem         Size       Used      Avail %Cap Mounted on
/dev/raid0a         16G       210M        15G   1% /
/dev/raid1a         63G       1.1G        59G   1% /usr
/dev/dk0           3.5T       1.2T       2.1T  37% /home
kernfs             1.0K       1.0K         0B 100% /kern
ptyfs              1.0K       1.0K         0B 100% /dev/pts
procfs             4.0K       4.0K         0B 100% /proc
tmpfs              8.0G       4.0K       8.0G   0% /tmp
4256EE1 #

Because of its size (> 2 TB) it was setup using dkctl & raidframe won'treport anything about it, how can I get that info for you ? Thanks & TIA.


--

	William A. Mahaffey III

 ----------------------------------------------------------------------

	"The M1 Garand is without doubt the finest implement of war
	 ever devised by man."
                           -- Gen. George S. Patton Jr.

Follow-Ups:
- Re: Beating a dead horse
  - From: Robert Elz
- Re: Beating a dead horse
  - From: Greg Oster

References:
- Beating a dead horse
  - From: William A. Mahaffey III
- Re: Beating a dead horse
  - From: Robert Elz

Prev by Date: Re: Beating a dead horse
Next by Date: Re: Beating a dead horse
Previous by Thread: Re: Beating a dead horse
Next by Thread: Re: Beating a dead horse
Indexes:

Home | Main Index | Thread Index | Old Index