tech-kern: Re: pool problems, TAILQ, and more...

Subject: Re: pool problems, TAILQ, and more...
To: None <bgrayson@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 03/26/2000 22:30:49
one way to debug this that would probably be very effective would be to
replace the pool operations in uvm_swap.c with some special allocation
code that would use 1 object per page and guarantee that the virtual
address used by an object would remain unmapped for "a while" after the
object is freed.  then whoever tries to modify the object will panic
instead of corrupting the pool freelist.  these debugging pool functions
would look something like:

dpool_get()
{
	vaddr = [ reuse an old virtual address or allocate a new one with
		uvm_km_valloc_wait() ];
	page = uvm_pagealloc(NULL, 0, NULL, 0);
	pmap_enter(pmap_kernel(), vaddr, VM_PAGE_TO_PHYS(page),
		UVM_PROT_ALL, PMAP_WIRED | VM_PROT_READ | VM_PROT_WRITE);
	return vaddr;
}

dpool_put(vaddr)
{
	pmap_extract(pmap_kernel(), vaddr, &paddr);
	page = PHYS_TO_VM_PAGE(paddr);
	pmap_page_protect(page, VM_PROT_NONE);
	[ put vaddr on a list to avoid reusing it for "a while" ]
}


I'll have some time later this week if you'd like to pursue this and
want more help.

-Chuck



On Sun, Mar 26, 2000 at 09:58:24PM -0600, Brian C. Grayson wrote:
> On Sun, Mar 26, 2000 at 09:12:02AM -0800, Jason R Thorpe wrote:
> > On Sun, Mar 26, 2000 at 04:15:23AM -0600, Brian C. Grayson wrote:
> > 
> >  >   I've got a little more info on my panics.  It looks like
> >  > something is modifying the region after the pool_put.
> > 
> > What kind of item is it, again?  I.e. what pool does it come from?
> 
>   swp vnd.
> 
> > You might try enabling the pool logging stuff, and gathering get/put
> > logs for that pool.  With that info, you might be able to track down
> > the offending code pretty easily.
> 
>   Yes, that and using a GENERIC kernel rather than my custom
> config were some of my first steps, about 60 core dumps ago.  :(  The
> free'd buf is being modified between the put and the next get.
> I'm trying to track down exactly where, but am not getting very
> far, as I'm not exactly sure of how uvm and swap and ffs all
> interact at the buf level.  All the puts and gets are from
> uvm_swap.c, so the logging isn't too helpful in this case --
> there's only one place where swp vnds are got and put.
> 
>   The free'd region is still correct when sw_reg_strategy() calls
> splx(), and in a swp buf pool_put().  But at the next logged
> message, it's been corrupted.
> 
>   For what it's worth, if I swap on /dev/wd1b instead of a file
> on /dev/wd1f, the system doesn't panic.
> 
>   Any suggestions?  So far, I've been sprinkling panics
> throughout some relevant routines that check that the
> last-freed swp vnd hasn't had its third word twiddled.  But it's
> very much a shotgun approach!
> 
>   TIA.
> 
>   Brian
> -- 
> "Old programmers never die.  They just branch to a new address."
> 						-Anonymous