Re: kern/40408 (panic: PG_ZERO page isn't zero-filled)

To: Mindaugas Rasiukevicius <rmind%netbsd.org@localhost>
Subject: Re: kern/40408 (panic: PG_ZERO page isn't zero-filled)
From: Andrew Doran <ad%netbsd.org@localhost>
Date: Fri, 16 Jan 2009 13:11:39 +0000

On Fri, Jan 16, 2009 at 10:04:01AM +0000, Mindaugas Rasiukevicius wrote:

> > db{2}> bt
> > breakpoint() at netbsd:breakpoint+0x5
> > cpu_Debugger() at netbsd:cpu_Debugger+0x9
> > panic() at netbsd:panic+0x260
> > uvm_pagezerocheck() at netbsd:uvm_pagezerocheck+0x80
> > uvm_pagefree() at netbsd:uvm_pagefree+0x38c
> > pmap_update() at netbsd:pmap_update+0x75
> > pmap_destroy() at netbsd:pmap_destroy+0x16f
> > uvmspace_free() at netbsd:uvmspace_free+0xcc
> > uvm_proc_exit() at netbsd:uvm_proc_exit+0x9e
> > exit1() at netbsd:exit1+0x1c3
> > sys_exit() at netbsd:sys_exit+0x53
> > syscall() at netbsd:syscall+0xb6
> 
> http://nxr.netbsd.org/source/xref/sys/arch/x86/x86/pmap.c#4704
> 
> - Since swap of l->l_md.md_gc_ptp; value is not atomic, is it safe?

It's thread local and not accessed from interrupt context.

> - Could comment be added why (where?) pages are expected to be zero-filled,
>   that is, why setting PG_ZERO?

Any freed ptp pages will be zeroed because all legitimate mappings will have
been removed by uvmspace_free(). All PTEs within containing ptp pages will
have been been zeroed.

The crash could be due to:

- A weird 'side band' access: pmap level mappings made directly by a thread
  within the process, to the user-space component of the address space,
  mappings which are not later manually removed. This would be a serious bug
  and so is unlikely to be the cause of the problem.

- A non-atomic update to a PTE made by the kernel, racing with a PTE
  writeback from another CPU (CPU initiated, not software initiated). This
  is also unlikely to be the problem, but I would inspect bus_dma.c and
  bus_space.c -- I remember seeing a non-atomic update somewhere, possibly
  there.

- A CPU bug where a PTE that has been zeroed atomically by software is later
  written back by a processor in order to set the A or D flag. I would check
  the errata book for this processor. The notes from both Intel and AMD
  regularly describe bugs affecting the TLB and paging structures, I guess
  because it's something they optimize aggressively.

- A hardware problem: bad memory, bad connections, unclean power, etc.

Andrew

References:
- Re: kern/40408 (panic: PG_ZERO page isn't zero-filled)
  - From: Mindaugas Rasiukevicius

Prev by Date: Re: misc/40404 (share/misc/airports has incorrect designation for BKK)
Next by Date: Re: kern/40413: KASSERT in uao_dropswap_range1
Previous by Thread: Re: kern/40408 (panic: PG_ZERO page isn't zero-filled)
Next by Thread: Re: kern/40408 (panic: PG_ZERO page isn't zero-filled)
Indexes:

Home | Main Index | Thread Index | Old Index