NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/46224: fatal page fault, kernfs_readdir()



The following reply was made to PR kern/46224; it has been noted by GNATS.

From: Lars Heidieker <lars%heidieker.de@localhost>
To: gnats-bugs%NetBSD.org@localhost, oster%cs.usask.ca@localhost
Cc: 
Subject: Re: kern/46224: fatal page fault, kernfs_readdir()
Date: Thu, 22 Mar 2012 17:58:39 +0100

 On 03/19/2012 03:55 PM, Greg Oster wrote:
 > The following reply was made to PR kern/46224; it has been noted by GNATS.
 > 
 > From: Greg Oster <oster%cs.usask.ca@localhost>
 > To: gnats-bugs%NetBSD.org@localhost
 > Cc: 
 > Subject: Re: kern/46224: fatal page fault, kernfs_readdir()
 > Date: Mon, 19 Mar 2012 08:53:57 -0600
 > 
 >  On Mon, 19 Mar 2012 02:30:01 +0000 (UTC)
 >  Petar Bogdanovic <petar%smokva.net@localhost> wrote:
 >  
 >  > >Number:         46224
 >  > >Category:       kern
 >  > >Synopsis:       fatal page fault, kernfs_readdir()
 >  > >Confidential:   no
 >  > >Severity:       critical
 >  > >Priority:       medium
 >  > >Responsible:    kern-bug-people
 >  > >State:          open
 >  > >Class:          sw-bug
 >  > >Submitter-Id:   net
 >  > >Arrival-Date:   Mon Mar 19 02:30:01 +0000 2012
 >  > >Originator:     Petar Bogdanovic
 >  > >Release:        NetBSD 6.0_BETA (16.03.2012)
 >  > >Organization:
 >  > >Environment:
 >  > amd64
 >  > >Description:
 >  >   a pretty recent netbsd-6 kernel (date: 16.03., arch: amd64)
 >  > just crashed several times.  The bug seems reproducible and does not
 >  >   appear, when no kernfs is involved:
 >  > 
 >  >   $ mount
 >  >   /dev/raid0a on / type ffs (log, NFS exported, local)
 >  >   kernfs on /kern type kernfs (local)
 >  > 
 >  >   $ sudo find / -name '*,v'
 >  >   /etc/mtree/special.local,v
 >  >   (...many more lines...)
 >  >   /var/backups/boot.cfg.current,v
 >  >   uvm_fault(0xfffffe8114c4dbd0, 0x0, 1) -> e
 >  >   fatal page fault in supervisor mode
 >  >   trap type 6 code 0 rip ffffffff804f4ceb cs 8 rflags 10297
 >  > cr2  0 cpl 0 rsp fffffe80016077a0
 >  >   kernel: page fault trap, code=0
 >  >   Stopped in pid 847.1 (find) at
 >  > netbsd:kernfs_readdir+0x687:    movq 7fb0b30e
 >  >   (%rip),%rdi
 >  >   db{1}> bt
 >  >   kernfs_readdir() at netbsd:kernfs_readdir+0x687
 >  >   VOP_READDIR() at netbsd:VOP_READDIR+0x65
 >  >   vn_readdir() at netbsd:vn_readdir+0xf6
 >  >   sys___getdents30() at netbsd:sys___getdents30+0x76
 >  >   syscall() at netbsd:syscall+0xc4
 >  > 
 >  > 
 >  >   The same situation yields a slightly different result when
 >  >   ddb.onpanic=0 and ends with what seems to be a complete
 >  > meltdown after the core was successfully dumped:
 >  > 
 >  >   uvm_fault(0xfffffe811556ad40, 0x0, 1) -> e
 >  >   fatal page fault in supervisor mode
 >  >   trap type 6 code 0 rip ffffffff804f4ceb cs 8 rflags 10297
 >  > cr2  0 cpl 0 rsp fffffe80015b77a0 panic: trap
 >  >   cpu1: Begin traceback...
 >  >   printf_nolog() at netbsd:printf_nolog
 >  >   startlwp() at netbsd:startlwp
 >  >   alltraps() at netbsd:alltraps+0xa2
 >  >   VOP_READDIR() at netbsd:VOP_READDIR+0x65
 >  >   vn_readdir() at netbsd:vn_readdir+0xf6
 >  >   sys___getdents30() at netbsd:sys___getdents30+0x76
 >  >   syscall() at netbsd:syscall+0xc4
 >  >   cpu1: End traceback...
 >  > 
 >  >   (..dump begins, finishes..)
 >  > 
 >  >   pmap_kenter_pa: mapping already present
 >  >   pmap_kenter_pa: mapping already present
 >  >   pmap_kenter_pa: mapping already present
 >  > 
 >  >   (..many, many more identical lines..)
 >  >   (..takes as long as the core dump..)
 >  > 
 >  >   pmap_kenter_pa: mapping already present
 >  >   pmap_kenter_pa: mapping already present
 >  >   pmap_kenter_pa: mapping already present
 >  >   succeeded
 >  > 
 >  > 
 >  >   Skipping crash dump on recursive panic
 >  >   panic: wdc_exec_command: polled command not done
 >  >   cpu1: Begin traceback...
 >  >   printf_nolog() at netbsd:printf_nolog
 >  >   wdccommand() at netbsd:wdccommand
 >  >   wd_flushcache() at netbsd:wd_flushcache+0xd7
 >  >   wd_shutdown() at netbsd:wd_shutdown+0x3e
 >  >   pmf_system_shutdown() at netbsd:pmf_system_shutdown+0x81
 >  >   cpu_reboot() at netbsd:cpu_reboot+0x2c
 >  >   vpanic() at netbsd:vpanic+0x1dd
 >  >   printf_nolog() at netbsd:printf_nolog
 >  >   startlwp() at netbsd:startlwp
 >  >   alltraps() at netbsd:alltraps+0xa2
 >  >   VOP_READDIR() at netbsd:VOP_READDIR+0x65
 >  >   vn_readdir() at netbsd:vn_readdir+0xf6
 >  >   sys___getdents30() at netbsd:sys___getdents30+0x76
 >  >   syscall() at netbsd:syscall+0xc4
 >  >   cpu1: End traceback...
 >  >   rebooting...
 >  > 
 >  > >How-To-Repeat:
 >  >   find /kern -ls
 >  > >Fix:
 >  >   none
 >  
 >  I don't know if I'm seeing quite the same error, but I've been chasing
 >  a similar issue the last few days... What I see is:
 >  
 >  fatal breakpoint trap in supervisor
 >  mode trap type 1 code 0 rip ffffffff80133415 cs e030 rflags 282 cr2
 >  7f7ff7327080 cpl 0 rsp
 >  ffffa0005b72d9a0 Stopped in pid 396.1 (find) at
 >  netbsd:breakpoint+0x5:  leave breakpoint() at netbsd:breakpoint+0x5
 >  pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0x25
 >  static_qc_pools() at ffffffff80661100
 >  static_qc_pools() at ffffffff80661480
 >  Bad frame pointer: 0xffffffff8078a7e0
 >  ds          ffff
 >  es          a14a
 >  fs          0
 >  gs          b278
 >  rdi         0
 >  rsi         fffffffe
 >  rbp         ffffa0005b72d9a0
 >  rbx         ffffa0005b72dad0
 >  rdx         1000000
 >  rcx         ffffa0000456b000
 >  rax         ffffffff80d0b0c0
 >  r8          ffffa0000456b000
 >  r9          400
 >  r10         2
 >  r11         ffffa0000460308d
 >  r12         ffffa00004603000
 >  r13         ffffa0000739a870
 >  r14         ffffffffffffffff
 >  r15         ffffa00004603098
 >  rip         ffffffff80133415    breakpoint+0x5
 >  cs          e030
 >  rflags      282
 >  rsp         ffffa0005b72d9a0
 >  ss          e02b
 >  netbsd:breakpoint+0x5:  leave
 >  db{3}> 
 >  
 >  and I can trigger it on-demand with a:  find -x / -name "ajsdf" -print
 >  The kernel is a netbsd-6 XEN3_DOMU kernel on amd64, with DEBUG and
 >  debug_freecheck turned on.  
 >  
 >  Later...
 >  
 >  Greg Oster
 >  
 > 
 
 If I haven't missed anything debug_freecheck is broken. I hacked my way
 around two problems first the disable logic if running out of slots is
 the wrong way round, if that is corrected startup fails as I circumvent
 that by a hack the system kept running until running out of slots (which
 I made to panic so I couldn't miss it).
 
 The bug(s) that are out there aren't those indicated by debug_freecheck.
 
 Lars
 


Home | Main Index | Thread Index | Old Index