NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/46224: fatal page fault, kernfs_readdir()



The following reply was made to PR kern/46224; it has been noted by GNATS.

From: Greg Oster <oster%cs.usask.ca@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/46224: fatal page fault, kernfs_readdir()
Date: Mon, 19 Mar 2012 08:53:57 -0600

 On Mon, 19 Mar 2012 02:30:01 +0000 (UTC)
 Petar Bogdanovic <petar%smokva.net@localhost> wrote:
 
 > >Number:         46224
 > >Category:       kern
 > >Synopsis:       fatal page fault, kernfs_readdir()
 > >Confidential:   no
 > >Severity:       critical
 > >Priority:       medium
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Mon Mar 19 02:30:01 +0000 2012
 > >Originator:     Petar Bogdanovic
 > >Release:        NetBSD 6.0_BETA (16.03.2012)
 > >Organization:
 > >Environment:
 > amd64
 > >Description:
 >      a pretty recent netbsd-6 kernel (date: 16.03., arch: amd64)
 > just crashed several times.  The bug seems reproducible and does not
 >      appear, when no kernfs is involved:
 > 
 >      $ mount
 >      /dev/raid0a on / type ffs (log, NFS exported, local)
 >      kernfs on /kern type kernfs (local)
 > 
 >      $ sudo find / -name '*,v'
 >      /etc/mtree/special.local,v
 >      (...many more lines...)
 >      /var/backups/boot.cfg.current,v
 >      uvm_fault(0xfffffe8114c4dbd0, 0x0, 1) -> e
 >      fatal page fault in supervisor mode
 >      trap type 6 code 0 rip ffffffff804f4ceb cs 8 rflags 10297
 > cr2  0 cpl 0 rsp fffffe80016077a0
 >      kernel: page fault trap, code=0
 >      Stopped in pid 847.1 (find) at
 > netbsd:kernfs_readdir+0x687:    movq 7fb0b30e
 >      (%rip),%rdi
 >      db{1}> bt
 >      kernfs_readdir() at netbsd:kernfs_readdir+0x687
 >      VOP_READDIR() at netbsd:VOP_READDIR+0x65
 >      vn_readdir() at netbsd:vn_readdir+0xf6
 >      sys___getdents30() at netbsd:sys___getdents30+0x76
 >      syscall() at netbsd:syscall+0xc4
 > 
 > 
 >      The same situation yields a slightly different result when
 >      ddb.onpanic=0 and ends with what seems to be a complete
 > meltdown after the core was successfully dumped:
 > 
 >      uvm_fault(0xfffffe811556ad40, 0x0, 1) -> e
 >      fatal page fault in supervisor mode
 >      trap type 6 code 0 rip ffffffff804f4ceb cs 8 rflags 10297
 > cr2  0 cpl 0 rsp fffffe80015b77a0 panic: trap
 >      cpu1: Begin traceback...
 >      printf_nolog() at netbsd:printf_nolog
 >      startlwp() at netbsd:startlwp
 >      alltraps() at netbsd:alltraps+0xa2
 >      VOP_READDIR() at netbsd:VOP_READDIR+0x65
 >      vn_readdir() at netbsd:vn_readdir+0xf6
 >      sys___getdents30() at netbsd:sys___getdents30+0x76
 >      syscall() at netbsd:syscall+0xc4
 >      cpu1: End traceback...
 > 
 >      (..dump begins, finishes..)
 > 
 >      pmap_kenter_pa: mapping already present
 >      pmap_kenter_pa: mapping already present
 >      pmap_kenter_pa: mapping already present
 > 
 >      (..many, many more identical lines..)
 >      (..takes as long as the core dump..)
 > 
 >      pmap_kenter_pa: mapping already present
 >      pmap_kenter_pa: mapping already present
 >      pmap_kenter_pa: mapping already present
 >      succeeded
 > 
 > 
 >      Skipping crash dump on recursive panic
 >      panic: wdc_exec_command: polled command not done
 >      cpu1: Begin traceback...
 >      printf_nolog() at netbsd:printf_nolog
 >      wdccommand() at netbsd:wdccommand
 >      wd_flushcache() at netbsd:wd_flushcache+0xd7
 >      wd_shutdown() at netbsd:wd_shutdown+0x3e
 >      pmf_system_shutdown() at netbsd:pmf_system_shutdown+0x81
 >      cpu_reboot() at netbsd:cpu_reboot+0x2c
 >      vpanic() at netbsd:vpanic+0x1dd
 >      printf_nolog() at netbsd:printf_nolog
 >      startlwp() at netbsd:startlwp
 >      alltraps() at netbsd:alltraps+0xa2
 >      VOP_READDIR() at netbsd:VOP_READDIR+0x65
 >      vn_readdir() at netbsd:vn_readdir+0xf6
 >      sys___getdents30() at netbsd:sys___getdents30+0x76
 >      syscall() at netbsd:syscall+0xc4
 >      cpu1: End traceback...
 >      rebooting...
 > 
 > >How-To-Repeat:
 >      find /kern -ls
 > >Fix:
 >      none
 
 I don't know if I'm seeing quite the same error, but I've been chasing
 a similar issue the last few days... What I see is:
 
 fatal breakpoint trap in supervisor
 mode trap type 1 code 0 rip ffffffff80133415 cs e030 rflags 282 cr2
 7f7ff7327080 cpl 0 rsp
 ffffa0005b72d9a0 Stopped in pid 396.1 (find) at
 netbsd:breakpoint+0x5:  leave breakpoint() at netbsd:breakpoint+0x5
 pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0x25
 static_qc_pools() at ffffffff80661100
 static_qc_pools() at ffffffff80661480
 Bad frame pointer: 0xffffffff8078a7e0
 ds          ffff
 es          a14a
 fs          0
 gs          b278
 rdi         0
 rsi         fffffffe
 rbp         ffffa0005b72d9a0
 rbx         ffffa0005b72dad0
 rdx         1000000
 rcx         ffffa0000456b000
 rax         ffffffff80d0b0c0
 r8          ffffa0000456b000
 r9          400
 r10         2
 r11         ffffa0000460308d
 r12         ffffa00004603000
 r13         ffffa0000739a870
 r14         ffffffffffffffff
 r15         ffffa00004603098
 rip         ffffffff80133415    breakpoint+0x5
 cs          e030
 rflags      282
 rsp         ffffa0005b72d9a0
 ss          e02b
 netbsd:breakpoint+0x5:  leave
 db{3}> 
 
 and I can trigger it on-demand with a:  find -x / -name "ajsdf" -print
 The kernel is a netbsd-6 XEN3_DOMU kernel on amd64, with DEBUG and
 debug_freecheck turned on.  
 
 Later...
 
 Greg Oster
 


Home | Main Index | Thread Index | Old Index