NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Prepping to install



On 05/13/15 13:14, David Brownlee wrote:
On 13 May 2015 at 16:03, William A. Mahaffey III <wam%hiwaay.net@localhost> wrote:
On 05/13/15 08:48, David Brownlee wrote:
On 12 May 2015 at 16:01, William A. Mahaffey III <wam%hiwaay.net@localhost> wrote:
On 05/12/15 02:32, David Brownlee wrote:
On 11 May 2015 at 23:46, William A. Mahaffey III <wam%hiwaay.net@localhost> wrote:

If you are using RAID5 I would strongly recommend keeping to
"power-of-two + 1" components, to keep the stripe size as a nice power
of two, otherwise performance is... significantly impaired.
Hmmmm .... Could you amplify on that point a bit ? I am intending to
maximize available storage & have already procured the mbd & 6 drives,
but I
could rethink things if my possibly hasty choices would be too burdensome
For RAID5 to perform efficiently data should be written in units which
re aligned with the RAID stripes and are a multiple of stripe in size,
otherwise a simple write changes into a read of  stripe, modification
of the affected part and then a write.

Filesystems tend to have sectors and blocks which are powers of two,
so the easiest way to arrange this for ffs is for the filesystem block
size to be a multiple of the stripe size ("1" is a fine multiple in
this case).

This is similar to the issue with sector drives which have 4K sectors
but present them 512byte sectors - if a filesystem is not 4K aligned
then write performance suffers horribly.
Hmmmmm .... OK, I think I have it. The stripe size is N * <some underlying
(disk|RAID) block size>, & if N (the # of active drives) is odd or prime (or
both, as in my case), we would have/need bizarre filesystem block sizes (for
alignment w/ RAID stripes) or unaligned FS blocks/sectors, which give crappy
performance, right ? Could you estimate how crappy crappy really is ? 25%
slower ? 50%, 100%, more ? Me scots sensibilities hate having almost 1 TiB
of drive sitting around idle (although I do crave speed enough to override)
:-/ ....
If you manage 25% of the performance (that is "only" 75% hit) I would
be surprised. I'd also be curious to see what number you do get :) -
I'm quite fond of pkgsrc/benchmarks/bonnie++ to get simple comparable
numbers. If you are testing, some things to vary
- Number of drives (5, 6)
- Stripe size, eg 4K per drive or 8K per drive
- Filesystem block size 32K, 64K (may not be able to use 64K for boot
partitions)
- mounting with '-o log' or not (generally you want this :)
Remember to ensure you have good (at least 4K) alignment on the base
partitions. If you have a modern '4K under the covers' drive and start
at sector 63... its not a good place to be

If you want to maximise space with some redundancy then as you say,
RAID5 is the way to go for the bulk of the storage.

A while back I setup a machine with 5 * 2TB disks with netbsd-6, with
small RAID1 partitions for root and the bulk as RAID5

http://abs0d.blogspot.co.uk/2011/08/setting-up-8tb-netbsd-file-server.html
(wow, was that really four years ago) - in your position I might keep
one 1TB as a scratch/build space and then RAID up the rest.

If you have time definitely experiment, get a feel for the different
performance available from the different options.
*Wow*, another fabulous resource. Your blog documents almost verbatim
what I
have in mind. I am going w/ 6 drives (already procured, 6 SATA3 slots on
the
mbd, done deal), but philosophically very close to what you describe. 1
question: if you were doing this again today, would it be fdisk or GPT ?
If I had >2TB drives it would be TB :) If not, I would still stick
with fdisk. The complexity of
gpt setup and wedge autoconfiguration is still greater than fdisk and
disklabel. I know I'm going to have to move to it at some point, but
I'm going to hold off until I need to

I think I am looking at 4 partitions per drive, ~16 GB for / (RAID1, 2
drives)
& /usr (4 drives, RAID10), 16 GB for swap (kernel driver, all 6 drives),
16
- 32 GB for /var (RAID5, all 6 drives), & the rest for /home (RAID5, all
6
drives). TIA & thanks again.
I would definitely hold off on RAID5 for everything except the large
/home. RAID1 is much simpler and more performance for writes. I would
also try to avoid configuring multiple RAID5s across overlapping sets
of disks, while it theoretically provides more IO bandwidth, that
bandwidth will be having to compete with all the other filesystems and
swap usage on the system.

If you wanted to use all six disks:
- 32G(RAID1 root+usr) 910G(non raid scratch space)
- 32G(RAID1 root+usr) 910G(RAID5 home)
- 32G(RAID1 var) 910G(RAID5 home)
- 32G(RAID1 var) 910G(RAID5 home)
- 32G(RAID1 swap+spare) 910G(RAID5 home)
- 32G(RAID1 swap+spare) 910G(RAID5 home)

32GB space notes:
- This gives you three 32GB RAID1 'pools' to allocate everything
outside of /home
- Can adjust the 32G up or down before partitioning, but all should be the
same
- In the suggestion, root+usr are kept on the same RAID (and could be
a single partition), so that the system can have all of the userland
available with only one disk attached, and a 'spare' partition is left
in case of later moderate additional space needs - maybe an extra
partition for /usr/pkg?, or for /var/pgsql, etc
- Obviously allocate usage within pools to taste - could put /usr on a
separate raid to provide more IO bandwidth for root & usr
This is interesting. I kinda wanted swap spread out over all 6 drives for
better swap I/O performance, an issue I am having with another box which is
laid out sorta like this, with swap 'on top of' a RAID0 block (admittedly
under Linux, not *BSD, but still), swap performance is horrible, several
min. to page in 200-300 MB worth of paged out VM. I was planning on as much
parallelization of each RAID as possible for max performance, & swap handled
by the kernel driver. Others have suggested swap on a RAID 'partition', is
that more de-rigeur for NetBSD, or the other BSD's for that matter ? This
box, under FreeBSD 9.3R-p13, has 4 swap partitions under straight kernel
management, & seems very spry, although it also has a lot of RAM & doesn't
swap much (on purpose, BTW) ....
Separate swap devices will give the best performance, RAID1 or 5 will
give robustness in the face of a single component failure. You pays
your money... Of course, if you have dedicated partitions on the disk
which you could RAID then you can even change your mind after install,
swapctl off the swap, mess with the partitions and away you go (nerves
of steve advised, though not required :)

910GB space notes:
- This gives 5* 910GB RAID5, which provides 4*910G (or 3640G) of space
- One disk is not included in the RAID5. This could be saved as a
spare for a RAID5 component failure (though a better approach might be
to have a disk on the desk next to the machine :), or used as non
raided scratch space. If it will not be active, then probably best to
put it on one of the components for the heaviest used 32G, or the most
important 32G
Your assessments are quite persuasive, I think I now like the 5 drive RAID5
for home, with that last partition as nebulous scratch space.

Note in the above that IO to /home will hit (almost) all disks, and
will affect all of the 32GB pools, so if you have heavy IO to /home do
not expect blistering performance from any filesystem. On the other
hand when /home has very light IO then you should have relatively nice
multi spindle performance from the other filesystems.
Yeah, but it would speed up access to *just* /home, right ? This box will be
backing up other boxen on my LAN, initially to a directory under /home, so I
want that I/O to be as swift as possible. I am maxing out the RAM (also
already procured), so I hope I don't have too much contention between I/O to
/home & swap ....
If you want home to be as fast a possible, then you really want to
prefer RAID1 to RAID5 (which conflicts with the space.. I know). I run
dirvish overnight from some machines to a RAID5 setup pretty much
identical to the one in my post, and it works well enough.

Having said all that, if I had the time to play I would install onto a
USB key, then script up the building and partitioning of the system in
many different forms and then chroot into the result and run some
tests to see how it performs.
I *definitely* want to script the partitioning both for repeatability in the
event of drive failure (or setting up another box) & to avoid fat-fingered
screw-ups !!!!

Thanks again for a fabulously informative reply.
Having discussed all this RAID5 goodness I feel obliged to comment
that when I finally run out of space and need a new build I'm probably
going to go for 4TB disks with six of them in RAID1 pairs for 12TB
with flexibility for adding more space (in 4TB units). If I *needed*
to get 16TB out of them I'd go the RAID5 route again, but I'm willing
to trade off the extra space for speed and simplicity. Of course I'm
really holding off having to actually *buy* six 4TB disks for as long
as humanly possible (by which point they may be 6TB disks, but who can
tell :)

OK, I'm back, after some irritating delays w/ hardware issues & a few 'life-intrusions'. I downloaded the 6.1.5 install image from a link provided earlier in this thread & just kicked it off to begin the install, planning to follow the earlier-referenced docs from Mr. Brownlee pretty closely. I immediately get screen-fulls of messages about 'input/output error', in green on the screen if that matters (rest of text in white). I also got similar messages during boot-up from the install image, but it eventually settled down & booted the installer. The messages look like:

ixpide0:0:0: recal drive fault
wd0d: device fault writing fsbn 0 (wd0 bn 0; cn 0 tn 0 sn 0) retrying

I got the above trying to write out a simple fdisk partition table for the 1st drive, following Mr. Brownlee's (excellent) documentation of his work doing a very similar install a few years back. I got these in bunches of 6 (12 lines total). I got many very similar messages during initial boot of the install image, referring to wd0, wd4 & wd5. All drives are virgin from the manufacturer (until now). I *think* some referred to reading instead of writing, but they scrolled by & I don't recall exactly. They were always 2 lines per message, formatted much as the above, 6 tries, for 12 lines per message total. The mbd is a Supermicro H8SCM AMD C32 board. I *did* take the default, which was install w/ ACPI. Try option 2 (install w/o ACPI) ? Any clues appreciated & thanks in advance.

--

	William A. Mahaffey III

 ----------------------------------------------------------------------

	"The M1 Garand is without doubt the finest implement of war
	 ever devised by man."
                           -- Gen. George S. Patton Jr.



Home | Main Index | Thread Index | Old Index