NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/59130: ataraid(4), ld(4): VIA RAID crashes on VX800 because of drive index misconfiguration
>Number: 59130
>Category: kern
>Synopsis: ataraid(4), ld(4): VIA RAID crashes on VX800 because of drive index misconfiguration
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Mar 04 23:10:00 +0000 2025
>Originator: Andrius V
>Release: current, netbsd-10, netbsd-9
>Organization:
>Environment:
>Description:
While fixing and testing the VX800 SATA/IDE controller in RAID mode, I noticed that ld(4) fails to attach and causes a crash when RAID is set up using RAID firmware.
The crash occurs due to a null pointer dereference in ata_raid_disk_vnode_find()
when strcmp is executed to compare device_xname(devlist) with device_xname(adi->dev).
The adi->dev is NULL for the wd0 disk device.
After some investigation, I identified the root cause of the null pointer: a misconfiguration of the drive index in the ata_raid_read_config_via() function (located in ata_raid_via.c).
The line adi = &aai->aai_disks[drive] uses `drive' as an index, and its value matches the drive's channel (drive = atabus->sc_chan->ch_channel).
This works for SATA controllers with one device per channel, but the VX800 and several other VIA controllers have two drives per channel, which leads to the aai->aai_disks[channel] being overridden by the second drive (in this case, wd1).
The fix should eliminate the reliance on the bus channel to determine the drive index. Potential solutions are provided in the fix section below.
bt and regs:
Searching for RAID components...
ataraid0: found 1 RAID volume
ld0 at ataraid0 vendtype 2 unit -2008953180: VIA V-RAID ATA RAID-1 array
uvm_fault(0xffffffff81b769e0, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8100cceb cs 0x8 rflags 0x10282 cr2 0x44 ilevel 0 rsp 0xffffffff81fc0bb8
curlwp 0xffffffff81a8c040 pid 0.0 lowest kstack 0xffffffff81fbb2c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:strcmp+0xb: movb 0(%rsi),%dl
strcmp() at netbsd:strcmp+0xb
ld_ataraid_attach() at netbsd:ld_ataraid_attach+0x1c9
config_attach_internal() at netbsd:config_attach_internal+0x1a7
config_found_acquire() at netbsd:config_found_acquire+0xd9
config_found() at netbsd:config_found+0x32
ataraid_attach() at netbsd:ataraid_attach+0x9e
config_attach_pseudo_acquire() at netbsd:config_attach_pseudo_acquire+0xae
config_attach_pseudo() at netbsd:config_attach_pseudo+0x11
ata_raid_finalize() at netbsd:ata_raid_finalize+0x2e
config_finalize() at netbsd:config_finalize+0x167
main() at netbsd:main+0x506
ds bb0
es 0
fs d000
gs 1f33
rdi ffff8134e0354a45
rsi 44
rbp ffffffff81fc0bf0
rbx ffff8134e0227b60
rdx 0
rcx ffffffffffffff
rax ffff8134e0354a77
r8 0
r9 0
r10 0
r11 ffff8880035f7000
r12 44
r13 ffff8134e039c8a8
r14 ffff8134e0354a00
r15 ffff8134e039c8a8
rip ffffffff8100cceb strcmp+0xb
cs 8
rflags 10282
rsp ffffffff81fc0bb8
ss 10
netbsd:strcmp+0xb: movb 0(%rsi),%dl
Metadata printed with ATA_RAID_DEBUG enabled:
wd0:
*************** ATA VIA Metadata ****************
magic 0xaa55
dummy_0 0x02
type RAID1
bootable 0
unknown 0
disk_index 0x00
stripe_layout 0x00
stripe_disks 0
stripe_sectors 8
disk_sectors 250069679
disk_id 0x8841cea4
DISK# disk_id
0 0x8841cea4
1 0x8a41cea4
checksum 0x26
=================================================
MAGIC == 0xaa55
wd1:
*************** ATA VIA Metadata ****************
magic 0xaa55
dummy_0 0x02
type RAID1
bootable 0
unknown 0
disk_index 0x04
stripe_layout 0x00
stripe_disks 0
stripe_sectors 8
disk_sectors 250069679
disk_id 0x8a41cea4
DISK# disk_id
0 0x8841cea4
1 0x8a41cea4
checksum 0x2c
=================================================
MAGIC == 0xaa55
>How-To-Repeat:
Setup RAID in firmware between two SATA devices on the same channel (type doesn't matter).
Boot NetBSD and observe the crash.
>Fix:
Currently, I see several options to fix the problem, but I am unsure which approach is the best. Please advise:
1) One option is to use aa->aai_curdisk++ as the code in ata_raid_intel, but personally, I don't prefer this solution.
2) Another option seems to be matching device_id with info->disks[disk] in the loop, which we already use to count the number of drives (inferred from RAID info). This has been tested and works for the VX800 at least. For me it looks like a preferable choice...
drive = -1;
for (count = 0, disk = 0; disk < 8; disk++)
if (info->disks[disk]) {
if (info->disk_id == info->disks[disk])
drive = count;
count++;
}
....
if (drive < 0 || drive >= aai->aai_ndisks) {
aprint_error_dev(dksc->sc_dev,
"drive number %d doesn't make sense within %d-disk "
"array\n", drive, aai->aai_ndisks);
error = EINVAL;
goto out;
}
3) disk_index seemingly can be used, but I am unsure if it is really the case (inferred from current values):
drive = info->disk_index >> 2;
Home |
Main Index |
Thread Index |
Old Index