[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)
On 2019/01/08 17:33, Nick Hudson wrote:
On 07/01/2019 10:22, Rin Okuyama wrote:
This kind of problems cannot be handled in software unless we
(1) use cached memory (for which unaligned access is allowed), or
(2) forbid compiler to generate unaligned access.
adds ARMV6_CACHED_DMA_MEMORY option. If enabled, DMA memory is forcibly
mapped cacheable on ARMv6+ [option (1) above]. This allows us unaligned
access to DMA buffer, however, as Nick and others pointed out, breaks
drivers that do not invalidate cache appropriately. If this option is
disabled, -mno-unaligned-access is added to CFLAGS [option (2) above].
I've tested both on my RPI3B+ (earmv7hf). Kernel of (1) works more than
12 hours without panic. However, vchiq(4) fails to initialize, and mue(4)
receives strange packets of zero-length (two times in 12 hours). Both
smell like driver bugs. Kernel of (2) works without problems as far as
I can see.
Does it work on rpi1? I forget if ARM11_COMPAT_MMU behaves differently here.
I bought my RPI0W yesterday to test it :-).
If ARMV6_CACHED_DMA_MEMORY option is specified, the kernel hangs
indefinitely when attaching audio0 at vcaudio0; unlike RPI3B+, vchiq0
and vcaudio0 are attached, however the system freezes before attaching
message of audio0. If vcaudio0 is disabled in the kernel config file,
the system worked without any troubles more than 12 hours.
The kernel built with -mno-unaligned-access works fine. Also, even if
neither ARMV6_CACHED_DMA_MEMORY nor -mno-unaligned-access options are
specified, axe(4) works without my hack in the current revision. I
confirmed that an instruction is generated which does unaligned access
to USB buffer.
This indicates that unaligned access is permitted even for non-cached
memory on ARMv6 in backward compatible MMU mode. Is it right? (I could
not find any documents about it...)
If it is true, we can possibly restrict ARMV6_CACHED_DMA_MEMORY option
to !ARM_MMU_V6C case. For arch/arm/arm32/bus_dma.c:
#if defined(ARMV6_CACHED_DMA_MEMORY) && !ARM_MMU_V6C
* Force cached mapping for ARMv6+, which allows us
* unaligned access to DMA buffer. For ARMv6 MMU in
* backward compatible mode, unaligned access is
* permitted for non-cached memory.
if (!CPU_IS_ARMV6_P() && !CPU_IS_ARMV7_P())
bool uncached = (flags & BUS_DMA_COHERENT);
Also, we can omit -mno-unaligned-access option in
.if !empty(MACHINE_ARCH:Mearmv6*) || !empty(MACHINE_ARCH:Mearmv7*)
. if ARMV6_CACHED_DMA_MEMORY
# Use cached memory for DMA on ARMv6+, which allows us unaligned
# access to DMA buffer.
. elif ARM11_COMPAT_MMU
# For ARMv6 MMU in backward compatible mode, unaligned access is
# permitted for non-cached memory.
# Otherwise, we need strictly-aligned access to DMA buffer.
I wrote a patch including both changes:
How do you think?
I also carried out simple benchmarks of building lang/perl5 of pkgsrc.
The working directory is USB SSD, TMPDIR is tmpfs, and terminal is ssh.
The difference is negligible: (1) 25:36.53 and (2) 25:34.81.
Using cached memory for DMA is slower?
Yes, at least for this single tries. However, it is only about 0.1%
difference, and I think it is within error margin. We may obtain
opposite results in another tries. Anyway, the difference would be
We should use cached memory for DMA in the future. However, it may break
more drivers than I observed on RPI. Therefore, I would like to propose
a compromise plan:
(a) Before branching netbsd-9, disable ARMV6_CACHED_MEMORY, and use
(b) After branching netbsd-9, enable ARMV6_CACHED_MEMORY, and stop using
(c) After debugging drivers, use cached memory for DMA unconditionally
on ARMv6+ and remove ARMV6_CACHED_DMA_MEMORY option.
Thoughts? Nick, does this look reasonable for you?
Let's backport all related fixes and plan to remove ARMV6_CACHED_DMA_MEMORY everywhere. (pending the rpi result)
Yeah, I would like to commit
the original patch
or the revised one (explained above)
if there's no objection until this weekend. Also, I will send a PR on
vchiq(4) problems. Which patch do you prefer? Or other ideas?
Main Index |
Thread Index |