Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: pcictl pci0 list panics on ROCKPro64



Hi,

On 2023/11/24 18:08, Tobias Nygren wrote:
On Fri, 24 Nov 2023 17:47:31 +0900
Masanobu SAITOH <msaitoh%execsw.org@localhost> wrote:

If an PCIe card is inserted the PCIe slot, "pcictl pci0 list"
panics with high probability.

I think we had this bug for a long time. One special thing about the
rk3399 pcie is that it is prone to generating a synchronous data abort
trap when attempting to read config space that is not claimed by any
device (as is done when probing for existing bdf triplets). For this
reason it has to use bus_space_peek to hide the trap from the caller.
There's not much other ARM code that uses bus_space_peek so it is where
I would start looking.


Some (incomplete) analysis:

For rk3399_pcie, when the target bus:dev:func is absent,
read for configuration space results in external abort due to
AXI slave error.

For A53 cores (cpu0-3, little), synchronous exception is raised for
the load instruction. It is handled by data_abort_handler(), and
successfully return via cpu_jump_onfault().

For A72 cores (cpu4-5, big), on the other hand, asynchronous
SError is raised belatedly for dsb(lb). It is handled by
trap_el1h_error(), and also return via cpu_jump_onfault(), but
something goes wrong in this case.

This should explain why panic occurs only for big cores.

SError is imprecise and not recoverable in general, IIUC, unless
RAS extension is implemented (unfortunately missing for A72).

Linux just panics in SError handler, if RAS extension is absent.
Probably, it does not cause problems just because only A53 core
(cpu0) is used for device configuration.

I guess that *something* in kernel context is spoiled during the
imprecise exception, but I've not yet figured out what it is.
I wonder whether we can recover from SError exception for this
case, at least.

In a different direction, I tried to raise synchronous exception
also for A72 cores, but in vain: Even though registers for
rk3399_pcie are mapped with BUS_SPACE_MAP_NONPOSTED (nGnRnE),
it should be neglected; they are located in a region mapped by
pmap_devmap(). I tried a kernel for which PCIe registers are
excluded from devmap, but nothing changed.

That's all at the moment...

Thanks,
rin


Home | Main Index | Thread Index | Old Index