Port-amd64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
major ATI IXP ahcisata lossage and unexpected raidframe save
I have an ABIT A-S78H 780G Socket AM2+ running NetBSD-5/amd64
and suffered major lossage on the SATA controller.
Under disk load the system would suddenly start churning out
messages like:
wd0a: error writing fsbn 891505504 of 891505504-891505535 (wd0 bn
891505567; cn 884430 tn 2 sn 1), retrying
wd0: (interface CRC error)
ahcisata0 port 0: device present, speed: 3.0Gb/s
wd0: soft error (corrected)
wd0a: error writing fsbn 1170739456 of 1170739456-1170739487 (wd0
bn 1170739519; cn 1161447 tn 14 sn 61), retrying
wd0: (interface CRC error)
ahcisata0 port 0: device present, speed: 3.0Gb/s
wd0: soft error (corrected)
this had happened one before but a reboot seemed to clear
it. This time it just kept coming back. Raid rebuild would
trigger it, cvs update, even an rsync. It managed to toast
my pkgsrc checkout and a locally hosted svn repo. At one
point I left it retrying for a couple of hours without any
benefit. Switching the SATA controller from AHCI to Native
or Compatible IDE in the BIOS didn't help.
The system has two ~1TB RAID1 raidframe mirrors, one for
system and another for archive data. The second one hasn't
triggered any issues (though obviously doesn't have the
same usage pattern).
Pulling one or other of the 'problem' disks in the raid
mirror just let the errors on the other. Testing one
of them in another machine was unable to reproduce the
issue.
Potentially relevant dmesg lines:
ahcisata0 at pci0 dev 17 function 0: vendor 0x1002 product 0x4391
ahcisata0: interrupting at ioapic0 pin 22
ahcisata0: AHCI revision 1.1, 6 ports, 32 command slots, features
0xf7228080
atabus0 at ahcisata0 channel 0
atabus1 at ahcisata0 channel 1
atabus2 at ahcisata0 channel 2
atabus3 at ahcisata0 channel 3
atabus4 at ahcisata0 channel 4
atabus5 at ahcisata0 channel 5
ahcisata0 port 1: device present, speed: 3.0Gb/s
ahcisata0 port 3: device present, speed: 3.0Gb/s
ahcisata0 port 4: device present, speed: 3.0Gb/s
ahcisata0 port 5: device present, speed: 3.0Gb/s
wd0 at atabus1 drive 0: <SAMSUNG HD103UJ>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x
1953525168 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
wd0(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6
(Ultra/133) (using DMA)
wd1 at atabus3 drive 0: <SAMSUNG HD154UI>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 1397 GB, 2907021 cyl, 16 head, 63 sec, 512 bytes/sect x
2930277168 sectors
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
wd1(ahcisata0:3:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6
(Ultra/133) (using DMA)
wd2 at atabus4 drive 0: <SAMSUNG HD154UI>
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 1397 GB, 2907021 cyl, 16 head, 63 sec, 512 bytes/sect x
2930277168 sectors
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
wd2(ahcisata0:4:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6
(Ultra/133) (using DMA)
wd3 at atabus5 drive 0: <SAMSUNG HD154UI>
wd3: drive supports 16-sector PIO transfers, LBA48 addressing
wd3: 1397 GB, 2907021 cyl, 16 head, 63 sec, 512 bytes/sect x
2930277168 sectors
wd3: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
wd3(ahcisata0:5:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6
(Ultra/133) (usin
Anyone seen anything similar?
It seems to have gone away for now, but I'm specifically not
stressing those disks until I get some services migrated off
this box.
Oh, and the raidframe save- at one point raidframe hard failed one
of the disks, so the subsequent reboot death spiral left it alone.
So *that* became the 'good' disk (which didn't have the svn
repo hosed).
Of course I have regular dirvish backups of everything, but its
nice to be able to use them as a check rather than a rebuild..
Home |
Main Index |
Thread Index |
Old Index