port-macppc/55326: gem(4): memory corruption by RX DMA

To: port-macppc-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: port-macppc/55326: gem(4): memory corruption by RX DMA
From: rokuyama.rk%gmail.com@localhost
Date: Sun, 31 May 2020 12:20:00 +0000 (UTC)

>Number:         55326
>Category:       port-macppc
>Synopsis:       gem(4): memory corruption by RX DMA
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-macppc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun May 31 12:20:00 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.64
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD macmini 9.99.64 NetBSD 9.99.64 (GENERIC) #65: Sun May 31 01:11:41 JST 2020  rin@latipes:/usr/src/sys/arch/macppc/compile/GENERIC macppc
>Description:
If DIAGNOSTIC is enabled for machine with gem(4), Mac mini for me, panic
occurs as:

panic: pr_phinpage_check: [mclpl] item 0x3fb0b040 not part of pool
cpu0: Begin traceback...
...: at vpanic+...
...: at panic+...
...: at pool_cache_put_paddr+...
...: at m_ext_free+...
...: at m_freem.part.7+...
...: at ether_input+...
...: at if_percpuq_softint+...
...: at softint_dispatch+...
...: at softint_fast_dispatch+...
saved LR(0x1c) is invalid.cpu0: End traceback...

I found that ph_page field became NULL when this panic occurred, whereas
it was correctly initialized at the time of MCLGET(9).

This dirty hack fixes the problem as far as I can see:

----
Index: sys/kern/uipc_mbuf.c
===================================================================
RCS file: /home/netbsd/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.241
diff -p -u -r1.241 uipc_mbuf.c
--- sys/kern/uipc_mbuf.c	5 May 2020 20:36:48 -0000	1.241
+++ sys/kern/uipc_mbuf.c	25 May 2020 14:08:51 -0000
@@ -188,8 +188,13 @@ mbinit(void)
 	    NULL, IPL_VM, mb_ctor, NULL, NULL);
 	KASSERT(mb_cache != NULL);
 
+#ifdef GEM_WORKAROUND /* XXXXXXXX */
+	mcl_cache = pool_cache_init(mclbytes, PAGE_SIZE, 0, 0, "mclpl",
+	    NULL, IPL_VM, NULL, NULL, NULL);
+#else
 	mcl_cache = pool_cache_init(mclbytes, COHERENCY_UNIT, 0, 0, "mclpl",
 	    NULL, IPL_VM, NULL, NULL, NULL);
+#endif
 	KASSERT(mcl_cache != NULL);
 
 	pool_cache_set_drain_hook(mb_cache, mb_drain, NULL);
----

Therefore, I guess that RX DMA of gem(4) pollutes memory located at the
page offset of DMA buffer. However, this is not documented in the manual[1].

(They only recommends buffers to be aligned in cache line (not mandatory),
but this is achieved even if DIAGNOSTIC is enabled; m_ext.ext_buf is aligned
in COHERENT_UNIT = 64, that is larger than 32, cache line of Mac mini.)

[1] Sun Microsystems, Gigabit Ethernet ASIC Specification
>How-To-Repeat:
Described above.
>Fix:
N/A. Hardware limitation? In that case, use its own pool for DMA buffer?

Prev by Date: port-powerpc/55325: oea/pmap: inconsistency in usage of two PVO pools
Next by Date: kern/55327: !cpu_intr_p() assertion failure in DDB
Previous by Thread: port-powerpc/55325: oea/pmap: inconsistency in usage of two PVO pools
Next by Thread: Re: port-macppc/55326: gem(4): memory corruption by RX DMA
Indexes:

Home | Main Index | Thread Index | Old Index