NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-evbarm/55639: Assertion "anon != NULL && anon->an_ref != 0" fails on evbarm-earmv7hf



The following reply was made to PR port-evbarm/55639; it has been noted by GNATS.

From: Chuck Silvers <chuq%chuq.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: port-evbarm/55639: Assertion "anon != NULL && anon->an_ref != 0"
 fails on evbarm-earmv7hf
Date: Thu, 3 Sep 2020 21:10:09 -0700

 On Thu, Sep 03, 2020 at 06:55:00AM +0000, Andreas Gustafsson wrote:
 > >Synopsis:       Assertion "anon != NULL && anon->an_ref != 0" fails on evbarm-earmv7hf
 ...
 >   panic: kernel diagnostic assertion "uvm_pagelookup(uobj, offset) == NULL || ((a->ar_flags & UVM_PAGE_ARRAY_FILL_DIRTY) != 0 && !uvm_obj_page_dirty_p(pg))" failed: file "/tmp/bracket/build/2020.08.14.09.06.15-evbarm-earmv7hf/src/sys/uvm/uvm_vnode.c", line 321
 
 you're talking about two different assertions here.
 the one about "uvm_pagelookup ..." was fixed by rev 1.117 of uvm_vnode.c.
 the one about "anon != NULL ..." is completely different.
 
 I can reproduce the latter amap corruption problem, but only on certain
 arm boards.  a jetson tk1 does not hit it, but a cubietruck hits it quite easily.
 it's good to know that the emulated system in qemu can also hit it.
 it looks like the qemu configuration used by anita is trying to have
 two CPUs, but the second one isn't actually there:
 
 [   1.0000000] cpu1 at cpus0: disabled (unresponsive)
 
 that's helpful in that it tells us the bug is not an MP race.
 
 the nature of the amap corruption that I've seen on cubietruck is
 a bit-flip in one of the entries in the amap's am_slots[] array,
 which causes different symptoms depending on exactly what is in the amap.
 
 I wrote some debug code to fully validate an amap immediately after
 locking it and immediately before unlocking it, and this problem is
 detected by the check immediately after locking the amap,
 ie. the bit is being flipped while the amap is not locked,
 so it's very unlikely that the code that operates on amaps
 is causing the corruption.
 
 I wrote some more debug code to make the mappings of all of the
 amap arrays read-only while the amap is not locked, but then
 I don't hit the problem.
 
 today I tried running the atf tests on cubietruck again with
 the uvm/radixtree commit that you reference reverted, and I still hit
 the same assertion in amap_wipeout() that the anita harness did.
 so it appears that this is an old bug, which is perhaps made more
 more likely to trigger an assertion by recent changes.
 
 -Chuck
 


Home | Main Index | Thread Index | Old Index