Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: aarch64 pmap tweaks for review



Noticeably faster with this patch applied, but I eventually hit this panic during a pkgsrc bulk build:

[ 2471.3017625] ubc_uiomove_direct: error=14
[ 2471.3017625] ubc_uiomove_direct: error=14
[ 2471.3117632] pid 27587 (as): user write of 98456@0xf9c3401b9000 at 142352 failed: 14
[ 2471.3217665] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for fffffffdfffff000: pc ffffc0000008da80: opcode f8647863: ldr x3, [x3,x4,lsl #3]

[ 2471.3317638] cpu17: Begin traceback...
[ 2471.3317638] trace fp ffffc008784cc5c0
[ 2471.3417639] fp ffffc008784cc5e0 vpanic() at ffffc000004b262c netbsd:vpanic+0x15c
[ 2471.3517632] fp ffffc008784cc650 panic() at ffffc000004b2724 netbsd:panic+0x44
[ 2471.3517632] fp ffffc008784cc6e0 data_abort_handler() at ffffc0000008c26c netbsd:data_abort_handler+0x4dc
[ 2471.3617726] tf ffffc008784cc760 el1_trap() at ffffc00000088b58 netbsd:el1_trap
[ 2471.3717676] ---- trapframe 0xffffc008784cc760 (304 bytes) ----
[ 2471.3817705]     pc=ffffc0000008da80,   spsr=0000000060000005
[ 2471.3817705]    esr=0000000096000004,    far=fffffffdfffff000
[ 2471.3917687]     x0=fffffffdfffff000,     x1=0000f9c340030000
[ 2471.3917687]     x2=ffffc008784ccb08,     x3=fffffffdfffff000
[ 2471.4017692]     x4=0000000000000000,     x5=0000000000200000
[ 2471.4017692]     x6=0000000000000001,     x7=0000000000000003
[ 2471.4117687]     x8=6bc3000f9c33fdf7,     x9=0000000000000050
[ 2471.4217694]    x10=0000000000000000,    x11=0000000000400000
[ 2471.4217694]    x12=0000f6ba5a73d000,    x13=0000f6ba5a73d000
[ 2471.4317696]    x14=0000000000000001,    x15=0000000000001002
[ 2471.4317696]    x16=0000f6ba5a73fd50,    x17=0000f6ba5a6cea74
[ 2471.4417700]    x18=0000000000000016,    x19=0000f9c340030000
[ 2471.4417700]    x20=ffff00009e378f80,    x21=0000000000000000
[ 2471.4517694]    x22=0000f9c340060000,    x23=0000000000000000
[ 2471.4617701]    x24=ffffc00000954f68,    x25=ffffc00000954af0
[ 2471.4617701]    x26=ffffc008784ccb58,    x27=ffff00886c76a400
[ 2471.4717698]    x28=ffff00009bc25b60, fp=x29=ffffc008784cca90
[ 2471.4717698] lr=x30=ffffc0000008e68c,     sp=ffffc008784cca90
[ 2471.4817706] ------------------------------------------------
[ 2471.4817706] fp ffffc008784cca90 _pmap_pte_lookup_bs() at ffffc0000008da80 netbsd:_pmap_pte_lookup_bs+0x68
[ 2471.4917705] fp ffffc008784ccaa0 _pmap_remove() at ffffc0000008e688 netbsd:_pmap_remove+0x80
[ 2471.5017713] fp ffffc008784ccb10 pmap_remove() at ffffc00000090b90 netbsd:pmap_remove+0x128
[ 2471.5117697] fp ffffc008784ccb60 uvm_unmap_remove() at ffffc000004222d8 netbsd:uvm_unmap_remove+0x258
[ 2471.5217700] fp ffffc008784ccbe0 uvmspace_free() at ffffc00000423010 netbsd:uvmspace_free+0xc8
[ 2471.5317697] fp ffffc008784ccc10 exit1() at ffffc00000457404 netbsd:exit1+0x174
[ 2471.5417700] fp ffffc008784ccd00 sigexit() at ffffc0000047d030 netbsd:sigexit+0x1e8
[ 2471.5417700] fp ffffc008784ccd50 postsig() at ffffc0000047d488 netbsd:postsig+0x280
[ 2471.5517694] fp ffffc008784cce20 lwp_userret() at ffffc00000461d60 netbsd:lwp_userret+0x1a8
[ 2471.5617691] fp ffffc008784cce70 trap_el0_sync() at ffffc0000008b5f8 netbsd:trap_el0_sync+0x448
[ 2471.5717692] tf ffffc008784cced0 el0_trap() at ffffc00000088bc4 netbsd:el0_trap
[ 2471.5817690] ---- trapframe 0xffffc008784cced0 (304 bytes) ----
[ 2471.5817690]     pc=0000fffff1e0df04,   spsr=0000000020000000
[ 2471.5917689]    esr=0000000092000001,    far=0000f9c340278000
[ 2471.5917689]     x0=00000002001885f0,     x1=0000f9c340278000
[ 2471.6017687]     x2=0000000000037a60,     x3=0000f9c3401e0000
[ 2471.6017687]     x4=0000000000000000,     x5=0000fffff1e22000
[ 2471.6117688]     x6=00000002001950e0,     x7=0000000001a742fc
[ 2471.6217685]     x8=0000000000000000,     x9=0000000000000000
[ 2471.6217685]    x10=0000000000000007,    x11=0000000000000001
[ 2471.6317686]    x12=000003e70d00b4c0,    x13=000003e70d00b4c5
[ 2471.6317686]    x14=0000000000000040,    x15=0000f9c3402d3150
[ 2471.6417686]    x16=0000000000000150,    x17=0000000000000000
[ 2471.6417686]    x18=0000000000011e88,    x19=0000000200101dc5
[ 2471.6517684]    x20=0000000200102760,    x21=0000000005854375
[ 2471.6617684]    x22=0000000000000018,    x23=00000002001885f0
[ 2471.6617684]    x24=0000fffff1e0e428,    x25=0000000000044660
[ 2471.6717684]    x26=0000f9c3402ce400,    x27=0000f9c3402ce000
[ 2471.6717684]    x28=0000000000000000, fp=x29=0000000000000000
[ 2471.6817682] lr=x30=0000fffff1e04d38,     sp=0000ffffff8a21b0
[ 2471.6817682] ------------------------------------------------
[ 2471.6917682] cpu17: End traceback...
Stopped in pid 27587.27587 (as) at netbsd:cpu_Debugger+0x4: ret

On Mon, 1 Jun 2020, Andrew Doran wrote:

Hi,

I made some tweaks to the aarch64 pmap based on lessons learned in the x86
pmap recently.  They reduce memory consumption and speed up things like
fork/exec/exit/UBC a little:

	http://www.netbsd.org/~ad/2020/aarch64.diff

Approximate times for kernel build on RK3399 with all 6 cores running at
600MHz:

	before	1354.07s real  6092.55s user  1591.35s system
	after	1307.90s real  6026.60s user  1432.83s system

Description below.  Comments welcome.

Thanks,
Andrew

- Fix a lock order reversal via pmap_page_protect().

- Align struct pmap to a cache line boundary.

- Move wired/resident count update out from PMAPCOUNTERS ifdef in one place.
 It shouldn't depend on it.

- Make sure pmap is always locked when updating stats.  Then atomics are no
 longer needed to update stats.

- Remove unneeded traversal of PV lists in pmap_enter_pv().

- Shrink struct vm_page from 136 to 128 bytes (cache line sized - reduce
 cache misses).

- Shrink struct pv_entry from 48 to 32 bytes (power of 2 sized - reduce cache
 misses).

- Embed a pv_entry in each vm_page.  That means PV entries don't need to be
 allocated for pages that are mapped only once, for example private
 anonymous memory / COW pages / most UBC mappings.  Dynamic PV entries are
 then used only for stuff like shared libraries and shared memory.

- Comment out PMAPCOUNTERS option because global counters are costly on MP
 due to cache coherency overhead.  The problem gets exponentially worse as
 more CPUs are added.

- Use the pmap as a source of pre-zeroed pages for the VM system.

- Do unlocked checks in pmap_page_protect() and pmap_clear_modify(): avoid
 taking the lock if the page has no mappings.




Home | Main Index | Thread Index | Old Index