tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: uvm page freelist problem



On Wed, Jan 16, 2008 at 11:31:15AM +0100, Matthias Drochner wrote:
> I've got a panic "uvm_pglistalloc: page not on freelist"
> (in uvm_pglist.c:uvm_pglist_add())
> two times now, when starting X where the Intel AGP code
> does some larger pglistallocs.
> 
> By code inspection I haven't found a place where the
> pglists are accessed without acquiring the uvm_fpageqlock.
> 
> Can it happen that a page changes its (pg->flags & PG_ZERO)
> status in the process which would cause that the page queue
> it is looked for at changes?

This is probably not your problem but I'm seeing similar behavior on another 
(PPC)
target.  What happens is a freelist TAILQ pointer becomes corrupted at some 
point
so the freelist scan prematurely aborts and, depending on the exact nature of
the corruption, results in the above panic.

(gdb) x/14x 0x9001e310
0x9001e310:     0x9001e2d8      0x9001e348      0x00000000      0x00000000
0x9001e320:     0x00000000      0x00000000      0xdeadbeef      0xdeadbeef
0x9001e330:     0x00004000      0xdeadbeef      0x00211e10      0x00211e64
0x9001e340:     0x02598001      0xffffc000
(gdb)
0x9001e348:     0x0000000e      0x9001e380      0x00000000      0x00000000
0x9001e358:     0x00000000      0x00000000      0xdeadbeef      0xdeadbeef
0x9001e368:     0x00000000      0xdeadbeef      0x00000000      0x00000001
0x9001e378:     0x0259c000      0x00000000
(gdb)
0x9001e380:     0x9001e348      0x9001e3b8      0x00000000      0x00000000
0x9001e390:     0x00000000      0x00000000      0xdeadbeef      0xdeadbeef
0x9001e3a0:     0x00000000      0xdeadbeef      0x00000000      0x00000001
0x9001e3b0:     0x025a0000      0x00000000


Notice the 0x9001e348 is the tp->pageq.tqe_next pointer and it should be 
0x9001e310.

In another test run, (so the corruption is different), looking at the 
physical address instead of virtual, the corruption is still apparent on
my JTAG debugger:

(at pool_init())
KERN>md 0x256348
00256348 : 9001e310 9001e380 00000000 00000000  ................
00256358 : 00000000 00000000 deadbeef deadbeef  ................
00256368 : 00000000 deadbeef 00000000 00000001  ................
00256378 : 0259c000 00000000 9001e348 9001e3b8  .Y.........H....

(later, during one of my bus initializations)
KERN>md 0x256348
00256348 : 00008030 9001e380 00000000 00000000  ...0............
00256358 : 00000000 00000000 deadbeef deadbeef  ................
00256368 : 00000000 deadbeef 00000000 00000001  ................
00256378 : 0259c000 00000000 9001e348 9001e3b8  .Y.........H....

In my case, I can stimulate the corruption by shovelling kernel printf()s
into/out of various places of the kernel (that are not in the path of 
execution).
A hardware datawatch (byte write, half word write, or word write) doesn't 
trigger the corruption.

I'm still working the problem on my target but hopefully something above
will give you some pointers (pun not intended) as to what's happening on
your end.





Home | Main Index | Thread Index | Old Index