Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
To: Greg Oster <oster@cs.usask.ca>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: tech-kern
Date: 04/09/2003 17:01:47
	Hello.  First, I'd like to thank everyone on the list who responded to
my e-mail.  Lots of people wondered if I might have a bad memory stick in
my machine.  I've now changed memory sims in my machine, and have started
seeing the same panic on another machine, so I don't think it's a
mechanical problem.	
	However, I do believe I see more clearly what's going on, though I
don't know how to fix it at the moment.  Perhaps my description will
trigger some thought of someone on this list.

	In case anyone is following this thread closely, I'm about to describe
what I believe is going on in the case of problem 1, the vm_fault panic.
	The panic occurs, very consistently, in pmap_change_attrs(() which is
in /usr/src/sys/arch/i386/i386/pmap.c.  What seems to be happening is that
this function begins doing its work, and in the middle of it, the CPU takes
a page fault.  At that point, the PTE on which the processor is running is
invalid, and uvm_fault panics.  I see in the middle of the
pmap_change_attrs() function  a diagnostic panic if the PTE is invalid at the
time the function is called.  I've not got DIAGNOSTIC turned on in my
kernel, so I'll turn it on to determine if the PTE is invalid when this
function gets called or if it gets invalidated during the time it's
running.  If the latter, then I think we've got a lock that isn't working
somewhere because, if I understand things correctly, we shouldn't take a
page fault when we're in pmap_change_attrs().  But, having said that, I
don't see where we disable interrupts or faults.  If it's the former, can
anyone think of why this would be happening?

-thanks
-Brian