Subject: tricky bug in ufs_lookup?
To: None <tech-kern@netbsd.org>
From: Karl Janmar <karl@utopiafoundation.org>
List: tech-kern
Date: 08/19/2005 18:33:34
Hi,

I occasionally get panics by a page-fault in kernel-mode.
This system is the netbsd-3 tag since 10 July.
I could not diff against anoncvs.netbsd.org as it's down....do I miss some vital
update?

uname -a
NetBSD ngong.utopiafoundation.org 3.0_BETA NetBSD 3.0_BETA (NGONG) #32: Thu Jul
28 16:55:12 CEST 2005
karl@ngong.utopiafoundation.org:/usr/src/sys/arch/i386/compile/NGONG i386

Unfortuantly I can't gdb generate a backtrace.
It says:
(gdb) target kcore /archive/crashes/netbsd.14.core
can not access 0x4, invalid translation (invalid PTE)
can not access 0x4, invalid translation (invalid PTE)
warning: cannot read switchframe registers
#0  0x00000008 in ?? ()
It seems like something is trashed, when I read the msgbufp it reads
zero, but the message-buffer is intact as the dmesg below shows.
(gdb) print msgbufp
$5 = (struct kern_msgbuf *) 0x0


dmesg -M netbsd.14.core reveal:
...[cut]....
uvm_fault(0xd8ebb380, 0x5c5000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c0377ffc cs 8 eflags 10296 cr2 5c51d0 ilevel 0
panic: trap
syncing disks... uvm_fault(0xd8ebb380, 0x5c5000, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c035b275 cs 8 eflags 10202 cr2 5c51d8 ilevel 0
panic: trap

dumping to dev 0,6 offset 4001797
....[cut]....


However I can resolv the vm_map that belongs to the page fault, but I don't have
a clue what to do with it:

(gdb) print *(struct vm_map *) 0xd8ebb380
$6 = {pmap = 0xe6964510, lock = {lk_interlock = {lock_data = 0}, lk_flags = 0,
    lk_sharecount = 0, lk_exclusivecount = 0, lk_recurselevel = 0,
    lk_waitcount = 0, lk_wmesg = 0xc079f46a "vmmaplk", lk_un = {lk_un_sleep = {
        lk_sleep_lockholder = -1, lk_sleep_locklwp = 0, lk_sleep_prio = 4,
        lk_sleep_timo = 0, lk_newlock = 0x0}, lk_un_spin = {
        lk_spin_cpu = 4294967295}}}, rbhead = {rbh_root = 0xd8e84420},
  header = {rb_entry = {rbe_left = 0x0, rbe_right = 0x0, rbe_parent = 0x0,
      rbe_color = 0}, ownspace = 0, space = 0, prev = 0xd8e848f0,
    next = 0xd8e84790, start = 3217031168, end = 0, object = {uvm_obj = 0x0,
      sub_map = 0x0}, offset = 0, etype = 0, protection = 0,
    max_protection = 0, inheritance = 0, wired_count = 0, aref = {
      ar_pageoff = 0, ar_amap = 0x0}, advice = 0, flags = 0 '\0'},
  nentries = 23, size = 38744064, ref_count = 1, ref_lock = {lock_data = 0},
  hint = 0xd8ebb3b4, hint_lock = {lock_data = 0}, first_free = 0xd8e84f20,
  flags = 65, flags_lock = {lock_data = 0}, timestamp = 143}

Below is the code in my kernel that triggered the kernel fault. Is somebody able
to read something out if this? or if you want more info please email me and I
can fix that.

Objdump of the affected location:
c0377fd4 <ufs_lookup>:
c0377fd4:       55                      push   %ebp
c0377fd5:       89 e5                   mov    %esp,%ebp
c0377fd7:       57                      push   %edi
c0377fd8:       56                      push   %esi
c0377fd9:       53                      push   %ebx
c0377fda:       81 ec bc 00 00 00       sub    $0xbc,%esp
c0377fe0:       c7 45 d4 ff ff ff ff    movl   $0xffffffff,0xffffffd4(%ebp)
c0377fe7:       c7 45 ec 00 00 00 00    movl   $0x0,0xffffffec(%ebp)
c0377fee:       8b 45 08                mov    0x8(%ebp),%eax
c0377ff1:       8b 5d 08                mov    0x8(%ebp),%ebx
c0377ff4:       8b 4d 08                mov    0x8(%ebp),%ecx
c0377ff7:       8b 5b 0c                mov    0xc(%ebx),%ebx
c0377ffa:       8b 40 04                mov    0x4(%eax),%eax
c0377ffd:       8b 49 08                mov    0x8(%ecx),%ecx
c0378000:       8b 90 a0 00 00 00       mov    0xa0(%eax),%edx

and the second:

c035b0e3 <ffs_sync>:
c035b0e3:       55                      push   %ebp
c035b0e4:       89 e5                   mov    %esp,%ebp
c035b0e6:       57                      push   %edi
c035b0e7:       56                      push   %esi
c035b0e8:       53                      push   %ebx
c035b0e9:       83 ec 2c                sub    $0x2c,%esp
c035b0ec:       c7 45 e8 00 00 00 00    movl   $0x0,0xffffffe8(%ebp)
c035b0f3:       8b 45 08                mov    0x8(%ebp),%eax
c035b0f6:       8b b8 08 09 00 00       mov    0x908(%eax),%edi
c035b0fc:       8b 57 14                mov    0x14(%edi),%edx
c035b0ff:       89 55 ec                mov    %edx,0xffffffec(%ebp)
...[cut]....
c035b263:       89 44 24 08             mov    %eax,0x8(%esp)
c035b267:       8d 45 f0                lea    0xfffffff0(%ebp),%eax
c035b26a:       89 44 24 04             mov    %eax,0x4(%esp)
c035b26e:       8b 07                   mov    (%edi),%eax
c035b270:       89 04 24                mov    %eax,(%esp)
c035b273:       e8 f2 4a ff ff          call   c034fd6a <softdep_flushworklist>
c035b278:       85 c0                   test   %eax,%eax
c035b27a:       0f 44 45 e8             cmove  0xffffffe8(%ebp),%eax
c035b27e:       85 c0                   test   %eax,%eax
c035b280:       89 45 e8                mov    %eax,0xffffffe8(%ebp)
c035b283:       0f 85 c6 fe ff ff       jne    c035b14f <ffs_sync+0x6c>
c035b289:       8b 4d f0                mov    0xfffffff0(%ebp),%ecx
c035b28c:       85 c9                   test   %ecx,%ecx


Regards,
Karl Janmar
- karl@utopiafoundation.org