NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/40408 (panic: PG_ZERO page isn't zero-filled)
On Fri, Jan 16, 2009 at 10:04:01AM +0000, Mindaugas Rasiukevicius wrote:
> > db{2}> bt
> > breakpoint() at netbsd:breakpoint+0x5
> > cpu_Debugger() at netbsd:cpu_Debugger+0x9
> > panic() at netbsd:panic+0x260
> > uvm_pagezerocheck() at netbsd:uvm_pagezerocheck+0x80
> > uvm_pagefree() at netbsd:uvm_pagefree+0x38c
> > pmap_update() at netbsd:pmap_update+0x75
> > pmap_destroy() at netbsd:pmap_destroy+0x16f
> > uvmspace_free() at netbsd:uvmspace_free+0xcc
> > uvm_proc_exit() at netbsd:uvm_proc_exit+0x9e
> > exit1() at netbsd:exit1+0x1c3
> > sys_exit() at netbsd:sys_exit+0x53
> > syscall() at netbsd:syscall+0xb6
>
> http://nxr.netbsd.org/source/xref/sys/arch/x86/x86/pmap.c#4704
>
> - Since swap of l->l_md.md_gc_ptp; value is not atomic, is it safe?
It's thread local and not accessed from interrupt context.
> - Could comment be added why (where?) pages are expected to be zero-filled,
> that is, why setting PG_ZERO?
Any freed ptp pages will be zeroed because all legitimate mappings will have
been removed by uvmspace_free(). All PTEs within containing ptp pages will
have been been zeroed.
The crash could be due to:
- A weird 'side band' access: pmap level mappings made directly by a thread
within the process, to the user-space component of the address space,
mappings which are not later manually removed. This would be a serious bug
and so is unlikely to be the cause of the problem.
- A non-atomic update to a PTE made by the kernel, racing with a PTE
writeback from another CPU (CPU initiated, not software initiated). This
is also unlikely to be the problem, but I would inspect bus_dma.c and
bus_space.c -- I remember seeing a non-atomic update somewhere, possibly
there.
- A CPU bug where a PTE that has been zeroed atomically by software is later
written back by a processor in order to set the A or D flag. I would check
the errata book for this processor. The notes from both Intel and AMD
regularly describe bugs affecting the TLB and paging structures, I guess
because it's something they optimize aggressively.
- A hardware problem: bad memory, bad connections, unclean power, etc.
Andrew
Home |
Main Index |
Thread Index |
Old Index