Subject: Re: 1.6ZI - pmap_clear_modify: pseg empty!
To: Rafal Boni <rafal@pobox.com>
From: Eduardo Horvath <eeh@NetBSD.org>
List: port-sparc64
Date: 01/27/2004 19:23:39
On Tue, Jan 27, 2004 at 12:29:21AM -0500, Rafal Boni wrote:
> In message <Pine.NEB.4.58.0401262225120.862@oni.i.purplei.com>, you write: 
> 
> -> 	Building pkgsrc on a newly installed 1.6ZI box and it managed to
> -> 	get itself into a loop. Unfortunately the loop ended with a panic,
> -> 	the first I've seen on a sparc64 box in a long while:
> -> 
> -> 	Does anyone have any thoughts/suggestions?

Actually that is not a panic.  It's a diagnostic assertion that's entering
the debugger.  

> 
> No, but I've seen this type of panic with a similar stack trace since
> maybe 1.6Q and just recently (within the last couple of weeks) got one
> with my 1.6ZE kernel.  The early history of this panic is that I think
> a bunch of work Chuq did on the pmap first made it much worse and then
> mostly vanquished it... I've seen it maybe 2 or 3 times outside that
> window where it seemed to happen mercilessly (which was sometime in
> the 1.6Q - 1.6T timeframe).

What's happening is that pmap_clear_modify is being called, a mapping
is found in the pv list for that page, but that mapping is not in the
appropriate page table.  

This can be caused by two things:

1) A race condition in pmap_enter()/pmap_remove() which results in a
pv entry but no page table entry.

2) An entry created by pmap_enter() which creates a pv entry but 
destroyed through pmap_kremove() which does not delete the pv entry.
This would indicate a bug in higher level code.

As far as diagnosing this, pmap_kremove() only operates on the kernel
pmap, so make sure the pmap associated with that pv is the kernel pmap.

Or you can enable pv_check() which should verify that every pv entry
has a correct mapping, trapping the illegal pmap operation.  You would
need to put a pv_check() at the end of pmap_kremove() and verify it's
in the correct location in all the other functions since I haven't used
it in years and it's probably suffering from bitrot.  But pv_check() is 
a very expensive operation and you will see a noticeable performance hit.

Eduardo