Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

major ATI IXP ahcisata lossage and unexpected raidframe save



        I have an ABIT A-S78H 780G Socket AM2+ running NetBSD-5/amd64
        and suffered major lossage on the SATA controller.

        Under disk load the system would suddenly start churning out
        messages like:

            wd0a: error writing fsbn 891505504 of 891505504-891505535 (wd0 bn 
891505567; cn 884430 tn 2 sn 1), retrying
            wd0: (interface CRC error)
            ahcisata0 port 0: device present, speed: 3.0Gb/s
            wd0: soft error (corrected)
            wd0a: error writing fsbn 1170739456 of 1170739456-1170739487 (wd0 
bn 1170739519; cn 1161447 tn 14 sn 61), retrying
            wd0: (interface CRC error)
            ahcisata0 port 0: device present, speed: 3.0Gb/s
            wd0: soft error (corrected)


        this had happened one before but a reboot seemed to clear
        it. This time it just kept coming back. Raid rebuild would
        trigger it, cvs update, even an rsync. It managed to toast
        my pkgsrc checkout and a locally hosted svn repo. At one
        point I left it retrying for a couple of hours without any
        benefit. Switching the SATA controller from AHCI to Native
        or Compatible IDE in the BIOS didn't help.

        The system has two ~1TB RAID1 raidframe mirrors, one for
        system and another for archive data. The second one hasn't
        triggered any issues (though obviously doesn't have the
        same usage pattern).

        Pulling one or other of the 'problem' disks in the raid
        mirror just let the errors on the other. Testing one
        of them in another machine was unable to reproduce the
        issue.

        Potentially relevant dmesg lines:

            ahcisata0 at pci0 dev 17 function 0: vendor 0x1002 product 0x4391
            ahcisata0: interrupting at ioapic0 pin 22
            ahcisata0: AHCI revision 1.1, 6 ports, 32 command slots, features 
0xf7228080
            atabus0 at ahcisata0 channel 0
            atabus1 at ahcisata0 channel 1
            atabus2 at ahcisata0 channel 2
            atabus3 at ahcisata0 channel 3
            atabus4 at ahcisata0 channel 4
            atabus5 at ahcisata0 channel 5
            ahcisata0 port 1: device present, speed: 3.0Gb/s
            ahcisata0 port 3: device present, speed: 3.0Gb/s
            ahcisata0 port 4: device present, speed: 3.0Gb/s
            ahcisata0 port 5: device present, speed: 3.0Gb/s
            wd0 at atabus1 drive 0: <SAMSUNG HD103UJ>
            wd0: drive supports 16-sector PIO transfers, LBA48 addressing
            wd0: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x 
1953525168 sectors
            wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
            wd0(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
(Ultra/133) (using DMA)
            wd1 at atabus3 drive 0: <SAMSUNG HD154UI>
            wd1: drive supports 16-sector PIO transfers, LBA48 addressing
            wd1: 1397 GB, 2907021 cyl, 16 head, 63 sec, 512 bytes/sect x 
2930277168 sectors
            wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
            wd1(ahcisata0:3:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
(Ultra/133) (using DMA)
            wd2 at atabus4 drive 0: <SAMSUNG HD154UI>
            wd2: drive supports 16-sector PIO transfers, LBA48 addressing
            wd2: 1397 GB, 2907021 cyl, 16 head, 63 sec, 512 bytes/sect x 
2930277168 sectors
            wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
            wd2(ahcisata0:4:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
(Ultra/133) (using DMA)
            wd3 at atabus5 drive 0: <SAMSUNG HD154UI>
            wd3: drive supports 16-sector PIO transfers, LBA48 addressing
            wd3: 1397 GB, 2907021 cyl, 16 head, 63 sec, 512 bytes/sect x 
2930277168 sectors
            wd3: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
            wd3(ahcisata0:5:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
(Ultra/133) (usin

        Anyone seen anything similar?

        It seems to have gone away for now, but I'm specifically not
        stressing those disks until I get some services migrated off
        this box.

        Oh, and the raidframe save- at one point raidframe hard failed one
        of the disks, so the subsequent reboot death spiral left it alone.
        So  *that* became the 'good' disk (which didn't have the svn
        repo hosed).
        Of course I have regular dirvish backups of everything, but its
        nice to be able to use them as a check rather than a rebuild..



Home | Main Index | Thread Index | Old Index