NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/46224: fatal page fault, kernfs_readdir()
The following reply was made to PR kern/46224; it has been noted by GNATS.
From: Lars Heidieker <lars%heidieker.de@localhost>
To: gnats-bugs%NetBSD.org@localhost, oster%cs.usask.ca@localhost
Cc:
Subject: Re: kern/46224: fatal page fault, kernfs_readdir()
Date: Thu, 22 Mar 2012 17:58:39 +0100
On 03/19/2012 03:55 PM, Greg Oster wrote:
> The following reply was made to PR kern/46224; it has been noted by GNATS.
>
> From: Greg Oster <oster%cs.usask.ca@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Cc:
> Subject: Re: kern/46224: fatal page fault, kernfs_readdir()
> Date: Mon, 19 Mar 2012 08:53:57 -0600
>
> On Mon, 19 Mar 2012 02:30:01 +0000 (UTC)
> Petar Bogdanovic <petar%smokva.net@localhost> wrote:
>
> > >Number: 46224
> > >Category: kern
> > >Synopsis: fatal page fault, kernfs_readdir()
> > >Confidential: no
> > >Severity: critical
> > >Priority: medium
> > >Responsible: kern-bug-people
> > >State: open
> > >Class: sw-bug
> > >Submitter-Id: net
> > >Arrival-Date: Mon Mar 19 02:30:01 +0000 2012
> > >Originator: Petar Bogdanovic
> > >Release: NetBSD 6.0_BETA (16.03.2012)
> > >Organization:
> > >Environment:
> > amd64
> > >Description:
> > a pretty recent netbsd-6 kernel (date: 16.03., arch: amd64)
> > just crashed several times. The bug seems reproducible and does not
> > appear, when no kernfs is involved:
> >
> > $ mount
> > /dev/raid0a on / type ffs (log, NFS exported, local)
> > kernfs on /kern type kernfs (local)
> >
> > $ sudo find / -name '*,v'
> > /etc/mtree/special.local,v
> > (...many more lines...)
> > /var/backups/boot.cfg.current,v
> > uvm_fault(0xfffffe8114c4dbd0, 0x0, 1) -> e
> > fatal page fault in supervisor mode
> > trap type 6 code 0 rip ffffffff804f4ceb cs 8 rflags 10297
> > cr2 0 cpl 0 rsp fffffe80016077a0
> > kernel: page fault trap, code=0
> > Stopped in pid 847.1 (find) at
> > netbsd:kernfs_readdir+0x687: movq 7fb0b30e
> > (%rip),%rdi
> > db{1}> bt
> > kernfs_readdir() at netbsd:kernfs_readdir+0x687
> > VOP_READDIR() at netbsd:VOP_READDIR+0x65
> > vn_readdir() at netbsd:vn_readdir+0xf6
> > sys___getdents30() at netbsd:sys___getdents30+0x76
> > syscall() at netbsd:syscall+0xc4
> >
> >
> > The same situation yields a slightly different result when
> > ddb.onpanic=0 and ends with what seems to be a complete
> > meltdown after the core was successfully dumped:
> >
> > uvm_fault(0xfffffe811556ad40, 0x0, 1) -> e
> > fatal page fault in supervisor mode
> > trap type 6 code 0 rip ffffffff804f4ceb cs 8 rflags 10297
> > cr2 0 cpl 0 rsp fffffe80015b77a0 panic: trap
> > cpu1: Begin traceback...
> > printf_nolog() at netbsd:printf_nolog
> > startlwp() at netbsd:startlwp
> > alltraps() at netbsd:alltraps+0xa2
> > VOP_READDIR() at netbsd:VOP_READDIR+0x65
> > vn_readdir() at netbsd:vn_readdir+0xf6
> > sys___getdents30() at netbsd:sys___getdents30+0x76
> > syscall() at netbsd:syscall+0xc4
> > cpu1: End traceback...
> >
> > (..dump begins, finishes..)
> >
> > pmap_kenter_pa: mapping already present
> > pmap_kenter_pa: mapping already present
> > pmap_kenter_pa: mapping already present
> >
> > (..many, many more identical lines..)
> > (..takes as long as the core dump..)
> >
> > pmap_kenter_pa: mapping already present
> > pmap_kenter_pa: mapping already present
> > pmap_kenter_pa: mapping already present
> > succeeded
> >
> >
> > Skipping crash dump on recursive panic
> > panic: wdc_exec_command: polled command not done
> > cpu1: Begin traceback...
> > printf_nolog() at netbsd:printf_nolog
> > wdccommand() at netbsd:wdccommand
> > wd_flushcache() at netbsd:wd_flushcache+0xd7
> > wd_shutdown() at netbsd:wd_shutdown+0x3e
> > pmf_system_shutdown() at netbsd:pmf_system_shutdown+0x81
> > cpu_reboot() at netbsd:cpu_reboot+0x2c
> > vpanic() at netbsd:vpanic+0x1dd
> > printf_nolog() at netbsd:printf_nolog
> > startlwp() at netbsd:startlwp
> > alltraps() at netbsd:alltraps+0xa2
> > VOP_READDIR() at netbsd:VOP_READDIR+0x65
> > vn_readdir() at netbsd:vn_readdir+0xf6
> > sys___getdents30() at netbsd:sys___getdents30+0x76
> > syscall() at netbsd:syscall+0xc4
> > cpu1: End traceback...
> > rebooting...
> >
> > >How-To-Repeat:
> > find /kern -ls
> > >Fix:
> > none
>
> I don't know if I'm seeing quite the same error, but I've been chasing
> a similar issue the last few days... What I see is:
>
> fatal breakpoint trap in supervisor
> mode trap type 1 code 0 rip ffffffff80133415 cs e030 rflags 282 cr2
> 7f7ff7327080 cpl 0 rsp
> ffffa0005b72d9a0 Stopped in pid 396.1 (find) at
> netbsd:breakpoint+0x5: leave breakpoint() at netbsd:breakpoint+0x5
> pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0x25
> static_qc_pools() at ffffffff80661100
> static_qc_pools() at ffffffff80661480
> Bad frame pointer: 0xffffffff8078a7e0
> ds ffff
> es a14a
> fs 0
> gs b278
> rdi 0
> rsi fffffffe
> rbp ffffa0005b72d9a0
> rbx ffffa0005b72dad0
> rdx 1000000
> rcx ffffa0000456b000
> rax ffffffff80d0b0c0
> r8 ffffa0000456b000
> r9 400
> r10 2
> r11 ffffa0000460308d
> r12 ffffa00004603000
> r13 ffffa0000739a870
> r14 ffffffffffffffff
> r15 ffffa00004603098
> rip ffffffff80133415 breakpoint+0x5
> cs e030
> rflags 282
> rsp ffffa0005b72d9a0
> ss e02b
> netbsd:breakpoint+0x5: leave
> db{3}>
>
> and I can trigger it on-demand with a: find -x / -name "ajsdf" -print
> The kernel is a netbsd-6 XEN3_DOMU kernel on amd64, with DEBUG and
> debug_freecheck turned on.
>
> Later...
>
> Greg Oster
>
>
If I haven't missed anything debug_freecheck is broken. I hacked my way
around two problems first the disable logic if running out of slots is
the wrong way round, if that is corrected startup fails as I circumvent
that by a hack the system kept running until running out of slots (which
I made to panic so I couldn't miss it).
The bug(s) that are out there aren't those indicated by debug_freecheck.
Lars
Home |
Main Index |
Thread Index |
Old Index