NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/52462: aac driver (and possibly others) needs MPification



>Number:         52462
>Category:       kern
>Synopsis:       aac driver (and possibly others) needs MPification
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 04 11:25:00 +0000 2017
>Originator:     he%NetBSD.org@localhost
>Release:        NetBSD 8.0_BETA
>Organization:
	I Try...
>Environment:
System: NetBSD 8.0_BETA (GENERIC) #10: Thu Aug  3 13:44:33 CEST 2017 he%vk.urc.uninett.no@localhost:/usr/obj/sys/arch/amd64/compile/GENERIC
Architecture: x86_64
Machine: amd64
>Description:
	Between netbsd-7 and netbsd-8, it appears that the 'ld' driver
	has been adapted to run multithreaded.  Under 'ld', there can
	be one of several hardware drivers, at least aac, cac, icp,
	mlx and nvme.  And .. at least the 'aac' driver is not adapted
	to run multithreaded; it uses splbio() for exclusion, and
	apparently that creates problems.

	I discovered this when trying to upgrade a machine from
	netbsd-7 to netbsd-8 which has this controller and disks:

aac0 at pci6 dev 0 function 0: IBM ServeRAID 8k
aac0:0 interrupting at ioapic0 pin 17
aac0: Enabling 64-bit address support
aac0: Enable 64-bit array support
aac0: New comm. interface enabled
aac0: MIPS 5KC at 250MHz, 32MB mem (16MB cache), optional battery not installed
ld0 at aac0 unit 0: RAID 1 (Mirror)
ld0: 232 GB, 30378 cyl, 255 head, 63 sec, 512 bytes/sect x 488036352 sectors
ld1 at aac0 unit 1: RAID 10
ld1: 465 GB, 60757 cyl, 255 head, 63 sec, 512 bytes/sect x 976072704 sectors

	The 8.0_BETA kernel booted, and I managed to extract the
	'base' set and 29% of the 'comp' set before it panic'ed:

fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8051cee3 cs 0x8 rflags 0x10246 cr2 0 ilevel 0x6 rsp 0xfffffe810e8fecd8
curlwp 0xfffffe823f731840 pid 0.2 lowest kstack 0xfffffe810e8fc2c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
aac_new_intr() at netbsd:aac_new_intr+0xb0
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xf2
--- interrupt ---
x86_stihlt() at netbsd:x86_stihlt+0x6
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xdb
idle_loop() at netbsd:idle_loop+0xf8
cpu0: End traceback...

	Catching it in ddb points towards a null pointer de-reference inside
	the expansion of the SIMPLEQ_FIRST macro in aac_ccb_enqueue, even
	though this looks on the surface of it decidedly nonsensical.
     
	So now the machine is hosed and requires a physical visit, even
	though I have a remote serial console.

	Apparently the aac driver needs to be properly converted to use the
	multithreaded synchronization primitives, but it's not as simple as
	just replacing splbio() with mutex_enter() and splx() with
	mutex_exit(), because there are certain operations which are not
	permitted while holding such a mutex, such as doing malloc() and
	calling bus_* functions.  So how this driver should be converted but
	still retaining its required mutual exclusion is not trivial, and
	certainly above my current abilities, so "Help!"


>How-To-Repeat:
	Try to minimally stress an 8.0_BETA machine with an 'aac' disk
	subsystem, and watch it panic.

>Fix:
	Sorry, don't know, ref. the above.



Home | Main Index | Thread Index | Old Index