NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RAIDframe write performance below expectations on a RAID-1 of two magnetic disks on NetBSD/amd64 9.1



Hello all,

this is about the write performance of RAIDframe. There is a lot to read about this on these mailing lists and I have been very busy trying out everything I could get my hands on, i.e. different alignment methods, manipulating the write strategy of the drives, experimenting with the file system parameters. Unfortunately, I have now reached a point where I have been before. At that time my "solution" was to abandon NetBSD and use FreeBSD with ZFS instead. This time I don't want to give up so fast :-) So I'll give it a try and try to describe my setup as detailed as possible. Maybe someone sees my obvious mistake and can give me the crucial tip.

The root filesystem is on a separate disk set (also on RAIDframe but SSD storage) and is not the subject of this problem. The problem refers to two identical magnetic hard disks I have (each 1 TB, 4 kb sector size), from which I want to form a RAID-1 with RAIDframe. To do this, I first created a partition for RAIDframe on each of the two disks via GPT:

	# gpt create wd2
	# gpt create wd3
	# gpt add -l raid1cmp0 -a 4k -t raid wd2
	# gpt add -l raid1cmp1 -a 4k -t raid wd3

Then I initialized the RAID with the following parameter file:

	START array
	1 2 0

	START disks
	NAME=raid1cmp0
	NAME=raid1cmp1

	START layout
	128 1 1 1

	START queue
	fifo 100

The speed of the parity rewrite had given me hope at first. I had already made several attempts with obviously wrong alignment and run times of approx. 10 hours were the result. With the correct alignment, the parity re-write runs in about 2 hours which, according to my research, should be a good average for the disk size.

On the RAID device (/dev/raid1 for me) I then created another GPT partition table and created a 4k-aligned partition in it as well:

	# gpt create raid1
	# gpt add -l data -a 4k -t ffs raid1
	# newfs -O 2 -b 16k -f 2k NAME=data

This was formatted with an FFS filesystem (with the recommended parameters from [1]) and mounted with the mount option "log".

However, the write throughput remains well below my expectations and I am despairing. When writing a 1 GB file, I achieve write rates of about 2 MB/s.

To me, this looks a bit like the hard drives are operating in the wrong mode in general. I suspected if the PIO mode is used instead of DMA. But I haven't found a reliable way to check that. Regardless of this, the disks achieve significantly higher write rates (80 MB/s and more) on their own (i.e. without a RAIDframe). In the dmesg it says that:

```
jupiter$ dmesg|grep wd2
[     2.660025] wd2 at atabus2 drive 0
[     2.660025] wd2: <ST1000LM048-2E7172>
[ 2.660025] wd2: drive supports 16-sector PIO transfers, LBA48 addressing [ 2.660025] wd2: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x 1953525168 sectors (0 bytes/physsect; first aligned sector: 8)
[     2.850025] wd2: GPT GUID: 01d01c56-2caf-4370-ac48-634c4c211de7
[ 2.850025] dk3 at wd2: "raid1cmp0", 1953525088 blocks at 40, type: raidframe [ 3.370025] wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), WRITE DMA FUA, NCQ (32 tags) [ 3.400024] wd2(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), WRITE DMA FUA EXT


jupiter$ dmesg|grep wd3
[     3.400024] wd3 at atabus3 drive 0
[     3.400024] wd3: <ST1000LM048-2E7172>
[ 3.400024] wd3: drive supports 16-sector PIO transfers, LBA48 addressing [ 3.400024] wd3: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x 1953525168 sectors (0 bytes/physsect; first aligned sector: 8)
[     3.520024] wd3: GPT GUID: aabb5ee0-c30f-4654-9380-3ab8ca81cd9b
[ 3.520024] dk4 at wd3: "raid1cmp1", 1953525088 blocks at 40, type: raidframe [ 3.530024] wd3: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), WRITE DMA FUA, NCQ (32 tags) [ 3.560024] wd3(ahcisata0:3:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), WRITE DMA FUA EXT
```

Doesn't look bad at first. The hard disks are identified as follows:

```
jupiter$ doas atactl wd2 identify
Model: ST1000LM048-2E7172, Rev: SDM1, Serial #:             WES22ZJS
World Wide Name: 5000C5009D54CC56
Device type: ATA, fixed
Capacity 1000 Gbytes, 1953525168 sectors, 512 bytes/sector
Cylinders: 16383, heads: 16, sec/track: 63
Physical sector size: 4096 bytes
First physically aligned sector: 8
Command queue depth: 32
Device capabilities:
        DMA
        LBA
        ATA standby timer values
        IORDY operation
        IORDY disabling
Device supports following standards:
ATA-4 ATA-5 ATA-6 ATA-7 ATA-8
Command set support:
        NOP command (enabled)
        READ BUFFER command (enabled)
        WRITE BUFFER command (enabled)
        Host Protected Area feature set (enabled)
        Look-ahead (enabled)
        Write cache (disabled)
        Power Management feature set (enabled)
        Security Mode feature set (disabled)
        SMART feature set (enabled)
        FLUSH CACHE EXT command (enabled)
        FLUSH CACHE command (enabled)
        Device Configuration Overlay feature set (enabled)
        48-bit Address feature set (enabled)
        SET MAX security extension (disabled)
        SET FEATURES required to spin-up after power-up (enabled)
        Power-Up In Standby feature set (disabled)
        Advanced Power Management feature set (enabled)
        DOWNLOAD MICROCODE command (enabled)
        World Wide Name
        WRITE DMA/MULTIPLE FUA EXT commands
        General Purpose Logging feature set
        SMART self-test
        SMART error logging
Serial ATA capabilities:
        1.5Gb/s signaling
        3.0Gb/s signaling
        6.0Gb/s signaling
        Native Command Queuing
        Host-Initiated Interface Power Management
        PHY Event Counters
Serial ATA features:
        DMA Setup Auto Activate (disabled)
        Device-Initiated Interface Power Managment (disabled)
        Software Settings Preservation (enabled)


jupiter$ doas atactl wd3 identify
Model: ST1000LM048-2E7172, Rev: SDM1, Serial #:             WES23Y53
World Wide Name: 5000C5009D54C4C7
Device type: ATA, fixed
Capacity 1000 Gbytes, 1953525168 sectors, 512 bytes/sector
Cylinders: 16383, heads: 16, sec/track: 63
Physical sector size: 4096 bytes
First physically aligned sector: 8
Command queue depth: 32
Device capabilities:
        DMA
        LBA
        ATA standby timer values
        IORDY operation
        IORDY disabling
Device supports following standards:
ATA-4 ATA-5 ATA-6 ATA-7 ATA-8
Command set support:
        NOP command (enabled)
        READ BUFFER command (enabled)
        WRITE BUFFER command (enabled)
        Host Protected Area feature set (enabled)
        Look-ahead (enabled)
        Write cache (enabled)
        Power Management feature set (enabled)
        Security Mode feature set (disabled)
        SMART feature set (enabled)
        FLUSH CACHE EXT command (enabled)
        FLUSH CACHE command (enabled)
        Device Configuration Overlay feature set (enabled)
        48-bit Address feature set (enabled)
        SET MAX security extension (disabled)
        SET FEATURES required to spin-up after power-up (enabled)
        Power-Up In Standby feature set (disabled)
        Advanced Power Management feature set (enabled)
        DOWNLOAD MICROCODE command (enabled)
        World Wide Name
        WRITE DMA/MULTIPLE FUA EXT commands
        General Purpose Logging feature set
        SMART self-test
        SMART error logging
Serial ATA capabilities:
        1.5Gb/s signaling
        3.0Gb/s signaling
        6.0Gb/s signaling
        Native Command Queuing
        Host-Initiated Interface Power Management
        PHY Event Counters
Serial ATA features:
        DMA Setup Auto Activate (disabled)
        Device-Initiated Interface Power Managment (disabled)
        Software Settings Preservation (enabled)

```

From this I could see that they do indeed have 4k sectors. To be on the safe side, I also checked the SMART values - it looks good to me - or am I wrong?

```
jupiter$ doas atactl wd2 smart status
SMART supported, SMART enabled
id value thresh crit collect reliability description                 raw
1 83 6 yes online positive Raw read error rate 206855044
  3  99    0     yes online  positive    Spin-up time                0
  4  91   20     no  online  positive    Start/stop count            9956
  5 100   36     yes online  positive    Reallocated sector count    0
7 78 45 yes online positive Seek error rate 17436089838 9 85 0 no online positive Power-on hours count 188209761891359
 10 100   97     yes online  positive    Spin retry count            0
 12  98   20     no  online  positive    Device power cycle count    2909
184 100   99     no  online  positive    End-to-end error            0
187 100    0     no  online  positive    Reported Uncorrectable Errors 0
188 100    0     no  online  positive    Command Timeout             0
189 100    0     no  online  positive    High Fly Writes             0
190 50 40 no online positive Airflow Temperature 50 Lifetime min/max 34/0
191 100    0     no  online  positive    G-sense error rate          179
192 100    0     no  online  positive    Power-off retract count     27
193   1    0     no  online  positive    Load cycle count            465174
194 50 0 no online positive Temperature 50 Lifetime min/max 0/12
197 100    0     no  online  positive    Current pending sector      0
198 100    0     no  offline positive    Offline uncorrectable       0
199 200    0     no  online  positive    Ultra DMA CRC error count   0
240 100 0 no offline positive Head flying hours 43877385902262 241 100 0 no offline positive Total LBAs Written 29912118872 242 100 0 no offline positive Total LBAs Read 25001009691
254 100    0     no  online  positive    Free Fall Sensor            0


jupiter$ doas atactl wd3 smart status
SMART supported, SMART enabled
id value thresh crit collect reliability description                 raw
1 76 6 yes online positive Raw read error rate 41698710
  3  99    0     yes online  positive    Spin-up time                0
  4 100   20     no  online  positive    Start/stop count            14
  5 100   36     yes online  positive    Reallocated sector count    0
7 69 45 yes online positive Seek error rate 7545197 9 100 0 no online positive Power-on hours count 1155346202744
 10 100   97     yes online  positive    Spin retry count            0
 12 100   20     no  online  positive    Device power cycle count    14
184 100   99     no  online  positive    End-to-end error            0
187 100    0     no  online  positive    Reported Uncorrectable Errors 0
188 100    0     no  online  positive    Command Timeout             1
189 100    0     no  online  positive    High Fly Writes             0
190 58 40 no online positive Airflow Temperature 42 Lifetime min/max 39/0
191 100    0     no  online  positive    G-sense error rate          0
192 100    0     no  online  positive    Power-off retract count     5
193 100    0     no  online  positive    Load cycle count            256
194 42 0 no online positive Temperature 42 Lifetime min/max 0/22
197 100    0     no  online  positive    Current pending sector      0
198 100    0     no  offline positive    Offline uncorrectable       0
199 200    0     no  online  positive    Ultra DMA CRC error count   0
240 100 0 no offline positive Head flying hours 125413045043241 241 100 0 no offline positive Total LBAs Written 4269631011 242 100 0 no offline positive Total LBAs Read 6033874459
254 100    0     no  online  positive    Free Fall Sensor            0

```

The partition tables on the raw disks look like this:

```
jupiter$ doas gpt show -a wd2
       start        size  index  contents
           0           1         PMBR
           1           1         Pri GPT header
           2          32         Pri GPT table
          34           6         Unused
          40  1953525088      1  GPT part - NetBSD RAIDFrame component
                                 Type: raid
TypeID: 49f48daa-b10e-11dc-b99b-0019d1879648
                                 GUID: c9e7c689-5708-482d-a7bc-9f622d596fb1
                                 Size: 932 G
                                 Label: raid1cmp0
                                 Attributes: None
  1953525128           7         Unused
  1953525135          32         Sec GPT table
  1953525167           1         Sec GPT header
jupiter$ doas gpt show -a wd3
       start        size  index  contents
           0           1         PMBR
           1           1         Pri GPT header
           2          32         Pri GPT table
          34           6         Unused
          40  1953525088      1  GPT part - NetBSD RAIDFrame component
                                 Type: raid
TypeID: 49f48daa-b10e-11dc-b99b-0019d1879648
                                 GUID: 106b5ce0-3a27-4b4d-8c5f-c8b45fac7651
                                 Size: 932 G
                                 Label: raid1cmp1
                                 Attributes: None
  1953525128           7         Unused
  1953525135          32         Sec GPT table
  1953525167           1         Sec GPT header
```

The partition table on the RAID looks like this:

```
jupiter$ doas gpt show raid1
       start        size  index  contents
           0           1         PMBR
           1           1         Pri GPT header
           2          32         Pri GPT table
          34           6         Unused
          40  1953524912      1  GPT part - NetBSD FFSv1/FFSv2
  1953524952           7         Unused
  1953524959          32         Sec GPT table
  1953524991           1         Sec GPT header
```

What can I try next? Have I made an obvious mistake?

Kind regards
Matthias


[1] https://zhadum.org.uk/2008/07/25/raid-and-file-system-performance-tuning/

Home | Main Index | Thread Index | Old Index