Subject: kern/33076: reproducable pool free list corruption
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Martin Husemann <martin@aprisoft.de>
List: netbsd-bugs
Date: 03/14/2006 08:45:01
>Number:         33076
>Category:       kern
>Synopsis:       reproducable pool free list corruption
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 14 08:45:01 +0000 2006
>Originator:     Martin Husemann
>Release:        NetBSD 3.99.16
>Organization:
>Environment:
System: NetBSD martins.aprisoft.de 3.99.16 NetBSD 3.99.16 (MARTINS) #0: Mon Mar 13 09:38:18 CET 2006 martin@martins.aprisoft.de:/usr/src/sys/arch/amd64/compile/MARTINS amd64
Architecture: x86_64
Machine: amd64
>Description:

On this machine, I can reliably panic the kernel:

pool_get(mbpl): free list modified: magic=ffffffff; page 0xffff800016ce9000;
item addr 0xffff800016ce9e00

(backtrace varies, as expected by this kind of corruption)

>How-To-Repeat:

I run two parallel cvs checkouts (one for xsrc, one for src) in /tmp.
That does it on this machine.

Previously I suspected random corruption since the problem seems not to happen
when I limit the usable RAM to 2GB, but after various other fixes this exact
panic (well, with varying addresses, of course) is the only kernel panic
happening.

So now I suspect some pretty volatile race condition instead.

>Fix:
Hints on how to panic closer to the culprit would be welcome ;-)