[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)
On 2019/01/07 10:59, matthew green wrote:
i fixed the hdafg.c ones here. not sure about the hdaudio.c
ones, since they are already 1u << 31. leaving:
beyond the xhci one, that actually doesn't matter since the
alignment is not required in the copy of the structure.
Well, there are two different problems on alignment. One is that in
structures, which should be fixed in software. The other is that
cannot be resolved in software, as pointed out by Michael.
Going back to the example of axe(4), more than one ethernet frames are
contained in RX buffer in general. Each frame has 4-byte H/W header
before it. The problem is that H/W headers are 2-byte aligned instead
of 4, which results in unaligned word-wise load by __builtin_memcpy().
|H/W |ethernet | |H/W |ethernet
|hdr |frame | |hdr |frame
This kind of problems cannot be handled in software unless we
(1) use cached memory (for which unaligned access is allowed), or
(2) forbid compiler to generate unaligned access.
adds ARMV6_CACHED_DMA_MEMORY option. If enabled, DMA memory is forcibly
mapped cacheable on ARMv6+ [option (1) above]. This allows us unaligned
access to DMA buffer, however, as Nick and others pointed out, breaks
drivers that do not invalidate cache appropriately. If this option is
disabled, -mno-unaligned-access is added to CFLAGS [option (2) above].
I've tested both on my RPI3B+ (earmv7hf). Kernel of (1) works more than
12 hours without panic. However, vchiq(4) fails to initialize, and mue(4)
receives strange packets of zero-length (two times in 12 hours). Both
smell like driver bugs. Kernel of (2) works without problems as far as
I can see.
I also carried out simple benchmarks of building lang/perl5 of pkgsrc.
The working directory is USB SSD, TMPDIR is tmpfs, and terminal is ssh.
The difference is negligible: (1) 25:36.53 and (2) 25:34.81.
We should use cached memory for DMA in the future. However, it may break
more drivers than I observed on RPI. Therefore, I would like to propose
a compromise plan:
(a) Before branching netbsd-9, disable ARMV6_CACHED_MEMORY, and use
(b) After branching netbsd-9, enable ARMV6_CACHED_MEMORY, and stop using
(c) After debugging drivers, use cached memory for DMA unconditionally
on ARMv6+ and remove ARMV6_CACHED_DMA_MEMORY option.
Thoughts? Nick, does this look reasonable for you?
Main Index |
Thread Index |