Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[src/netbsd-1-5]: src/sbin/raidctl Pullup 1.22 [oster]:



details:   https://anonhg.NetBSD.org/src/rev/34d8529db6e6
branches:  netbsd-1-5
changeset: 489996:34d8529db6e6
user:      tv <tv%NetBSD.org@localhost>
date:      Mon Oct 30 21:58:50 2000 +0000

description:
Pullup 1.22 [oster]:
- cleanup wording and add additional comments on such things as
    "component1" and "raidctl -A yes"
- add a note about how to build a RAID set with a limited number of disks
    (thanks to Simon Burge for suggestions)
- improve layout of 'raidctl -i' discussion (thanks to Hubert Feyrer)
- add a (small) section on Performance Tuning

diffstat:

 sbin/raidctl/raidctl.8 |  190 ++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 172 insertions(+), 18 deletions(-)

diffs (239 lines):

diff -r 324c031239b3 -r 34d8529db6e6 sbin/raidctl/raidctl.8
--- a/sbin/raidctl/raidctl.8    Thu Oct 26 21:12:21 2000 +0000
+++ b/sbin/raidctl/raidctl.8    Mon Oct 30 21:58:50 2000 +0000
@@ -1,4 +1,4 @@
-.\"     $NetBSD: raidctl.8,v 1.19.2.2 2000/08/10 16:22:28 oster Exp $
+.\"     $NetBSD: raidctl.8,v 1.19.2.3 2000/10/30 21:58:50 tv Exp $
 .\"
 .\" Copyright (c) 1998 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -581,12 +581,9 @@
 as using the same serial number for all RAID sets will only serve to
 decrease the usefulness of the component label checking.
 .Pp
-Initializing the RAID set is done via:
-.Bd -unfilled -offset indent
-raidctl -i raid0
-.Ed
-.Pp
-This initialization 
+Initializing the RAID set is done via the
+.Fl i
+option.  This initialization 
 .Ar MUST
 be done for 
 .Ar all
@@ -595,7 +592,11 @@
 quite time-consuming, the
 .Fl v
 option may be also used in conjunction with
-.Fl i .  
+.Fl i :
+.Bd -unfilled -offset indent
+raidctl -iv raid0
+.Ed
+.Pp
 This will give more verbose output on the
 status of the initialization:
 .Bd -unfilled -offset indent
@@ -624,6 +625,45 @@
 on the device or its filesystems, and then to mount the filesystems
 for use.
 .Pp
+Under certain circumstances (e.g. the additional component has not
+arrived, or data is being migrated off of a disk destined to become a
+component) it may be desirable to to configure a RAID 1 set with only
+a single component.  This can be achieved by configuring the set with
+a physically existing component (as either the first or second
+component) and with a
+.Sq fake
+component.  In the following:
+.Bd -unfilled -offset indent
+START array
+# numRow numCol numSpare
+1 2 0
+
+START disks
+/dev/sd6e
+/dev/sd0e
+
+START layout
+# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
+128 1 1 1
+
+START queue
+fifo 100
+.Ed
+.Pp
+/dev/sd0e is the real component, and will be the second disk of a RAID 1
+set.  The component /dev/sd6e, which must exist, but have no physical
+device associated with it, is simply used as a placeholder.
+Configuration (using 
+.Fl C
+and 
+.Fl I Ar 12345
+as above) proceeds normally, but initialization of the RAID set will
+have to wait until all physical components are present.  After
+configuration, this set can be used normally, but will be operating 
+in degraded mode.  Once a second physical component is obtained, it
+can be hot-added, the existing data mirrored, and normal operation
+resumed.
+.Pp
 .Ss Maintenance of the RAID set
 After the parity has been initialized for the first time, the command:
 .Bd -unfilled -offset indent
@@ -887,6 +927,31 @@
 No spares.
 .Ed
 .Pp
+In circumstances where a particular component is completely
+unavailable after a reboot, a special component name will be used to
+indicate the missing component.  For example:
+.Bd -unfilled -offset indent
+Components:
+           /dev/sd2e: optimal
+          component1: failed
+No spares.
+.Ed
+.Pp
+indicates that the second component of this RAID set was not detected
+at all by the auto-configuration code.  The name
+.Sq component1
+can be used anywhere a normal component name would be used.  For
+example, to add a hot spare to the above set, and rebuild to that hot
+spare, the following could be done:
+.Bd -unfilled -offset indent
+raidctl -a /dev/sd3e raid0
+raidctl -F component1 raid0
+.Ed
+.Pp
+at which point the data missing from 
+.Sq component1 
+would be reconstructed onto /dev/sd3e.
+.Pp
 .Ss RAID on RAID
 RAID sets can be layered to create more complex and much larger RAID
 sets.  A RAID 0 set, for example, could be constructed from four RAID
@@ -947,16 +1012,24 @@
 raidctl -A root raid0
 .Ed
 .Pp
-Note that since kernels cannot (currently) be directly read from RAID
-components or RAID sets, some other mechanism must be used to get a
-kernel booting.  For example, a small partition containing only the
-secondary boot-blocks and an alternate kernel (or two) could be used.
-Once a kernel is booting however, and an auto-configuring RAID set is
-found that is eligible to be root, then that RAID set will be
-auto-configured and used as the root device.  If two or more RAID sets
-claim to be root devices, then the user will be prompted to select the
-root device.  At this time, RAID 0, 1, 4, and 5 sets are all supported
-as root devices.
+To return raid0a to be just an auto-configuring set simply use the
+.Fl A Ar yes
+arguments.
+.Pp
+Note that kernels can only be directly read from RAID 1 components on
+alpha and pmax architectures.  On those architectures, the 
+.Dv FS_RAID
+filesystem is recognized by the bootblocks, and will properly load the
+kernel directly from a RAID 1 component.  For other architectures, or
+to support the root filesystem on other RAID sets, some other
+mechanism must be used to get a kernel booting.  For example, a small
+partition containing only the secondary boot-blocks and an alternate
+kernel (or two) could be used.  Once a kernel is booting however, and
+an auto-configuring RAID set is found that is eligible to be root,
+then that RAID set will be auto-configured and used as the root
+device.  If two or more RAID sets claim to be root devices, then the
+user will be prompted to select the root device.  At this time, RAID
+0, 1, 4, and 5 sets are all supported as root devices.
 .Pp
 A typical RAID 1 setup with root on RAID might be as follows:
 .Bl -enum
@@ -1022,6 +1095,87 @@
 .Pp
 at which point the device is ready to be reconfigured.
 .Pp
+.Ss Performance Tuning
+Selection of the various parameter values which result in the best
+performance can be quite tricky, and often requires a bit of
+trial-and-error to get those values most appropriate for a given system.
+A whole range of factors come into play, including:
+.Bl -enum
+.It
+Types of components (e.g. SCSI vs. IDE) and their bandwidth
+.It
+Types of controller cards and their bandwidth
+.It
+Distribution of components among controllers
+.It
+IO bandwidth
+.It
+Filesystem access patterns
+.It 
+CPU speed
+.El
+.Pp
+As with most performance tuning, benchmarking under real-life loads
+may be the only way to measure expected performance.  Understanding
+some of the underlying technology is also useful in tuning.  The goal
+of this section is to provide pointers to those parameters which may
+make significant differences in performance.
+.Pp
+For a RAID 1 set, a SectPerSU value of 64 or 128 is typically
+sufficient.  Since data in a RAID 1 set is arranged in a linear
+fashion on each component, selecting an appropriate stripe size is
+somewhat less critical than it is for a RAID 5 set.  However: a stripe
+size that is too small will cause large IO's to be broken up into a
+number of smaller ones, hurting performance.  At the same time, a
+large stripe size may cause problems with concurrent accesses to
+stripes, which may also affect performance.  Thus values in the range
+of 32 to 128 are often the most effective.
+.Pp
+Tuning RAID 5 sets is trickier.  In the best case, IO is presented to
+the RAID set one stripe at a time.  Since the entire stripe is
+available at the beginning of the IO, the parity of that stripe can
+be calculated before the stripe is written, and then the stripe data
+and parity can be written in parallel.  When the amount of data being
+written is less than a full stripe worth, the
+.Sq small write
+problem occurs.  Since a 
+.Sq small write
+means only a portion of the stripe on the components is going to
+change, the data (and parity) on the components must be updated
+slightly differently.  First, the 
+.Sq old parity
+and 
+.Sq old data
+must be read from the components.  Then the new parity is constructed,
+using the new data to be written, and the old data and old parity.
+Finally, the new data and new parity are written.  All this extra data
+shuffling results in a serious loss of performance, and is typically 2
+to 4 times slower than a full stripe write (or read).  To combat this
+problem in the real world, it may be useful to ensure that stripe
+sizes are small enough that a
+.Sq large IO
+from the system will use exactly one large stripe write. As is seen
+later, there are some filesystem dependencies which may come into play
+here as well.
+.Pp
+Since the size of a 
+.Sq large IO
+is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
+be desirable to select a SectPerSU value of 16 blocks (8K) or 32
+blocks (16K).  Since there are 4 data sectors per stripe, the maximum
+data per stripe is 64 blocks (32K) or 128 blocks (64K).  Again,
+empirical measurement will provide the best indicators of which
+values will yeild better performance.
+.Pp
+The parameters used for the filesystem are also critical to good
+performance.  For 
+.Xr newfs 8 , 
+for example, increasing the block size to 32K or 64K may improve
+performance dramatically.  As well, changing the cylinders-per-group
+parameter from 16 to 32 or higher is often not only necessary for
+larger filesystems, but may also have positive performance
+implications.
+.Pp
 .Ss Summary
 Despite the length of this man-page, configuring a RAID set is a
 relatively straight-forward process.  All that needs to be done is the



Home | Main Index | Thread Index | Old Index