Subject: Re: Another serious bug in NetBSD-1.6.1
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Rafal Boni <rafal@attbi.com>
List: port-i386
Date: 03/12/2003 20:53:15
[This was originally on port-i386... I'm adding adding tech-kern and, port-
 sparc64 to CCs since I'm seeing panics stemming from a similar code path on
 sparc64...]

In message <200303120150.h2C1oIa03881@lothlorien.nfbcal.org>, Brian writes: 

[...]
-> 	I'm getting a double panic which looks like:
-> uvm_fault(0xc05d7300, 0xffc00000, 0, 1) -> e
-> fatal page fault in supervisor mode
-> trap type 6 code 0 eip c0311347 cs 8 eflags 10202 cr2 ffc000c4 cpl 0
-> panic: trap
-> syncing disks... panic: lockmgr: locking against myself
-> 
-> The first argument in the uvm_fault message varies by 20 bytes or so, but
-> the other two arguments, along with the error code at the end, are always
-> the same.  The error code is EFAULT and the ffc0000 argument corresponds
-> to this definition in /usr/src/sys/arch/i386/i386/locore.s
-> 
-> /*
->  * APTmap, APTD is the alternate recursive pagemap.
->  * It's used when modifying another process's page tables.
->  *
->  * XXX 4 == sizeof pde
->  */
-> 	.set	_C_LABEL(APTmap),(PDSLOT_APTE << PDSHIFT)
-> 	.set	_C_LABEL(APTD),(_C_LABEL(APTmap) + PDSLOT_APTE * NBPG)
-> 	.set	_C_LABEL(APTDpde),(_C_LABEL(PTD) + PDSLOT_APTE * 4)
-> 
-> 
-> 	These panics occur when the syncer kernel thread is running.  
-> Specifically, genfs_putpages, which is called from ffs_putpages.

Interesting, I've been seeing panics on sparc64 in a similar code path.
The two panic messages and backtraces noted below.  However, my machine
is running -current, not 1.6.x

panic: kernel diagnostic assertion "(data & TLB_NFO) == 0" failed: file "/extra/src-current/sys/arch/sparc64/sparc64/pmap.c", line 2586

With a backtrace of:
pmap_clear_modify(93fb930, ffffffffffffe000, 1, c, 0, 1c09c80) at pmap_clear_mod
ify+0x94
genfs_putpages(0, 11, 92223c0, 0, ffff0002, 11b0000) at genfs_putpages+0x4f8
ffs_putpages(92377d0, 1094ba4, 188, 1e3d000, 1863c00, 1821800) at ffs_putpages+0
xdc
VOP_PUTPAGES(9a485b0, 0, 0, 11, 4df0e0, 0) at VOP_PUTPAGES+0x30
ffs_full_fsync(9237a90, 10012, 108, 1e3d000, 8d6000, 0) at ffs_full_fsync+0xa4
ffs_fsync(9237a90, 10945ac, 98, 1e3d000, 0, 8) at ffs_fsync+0x34      
VOP_FSYNC(9a485b0, 1e3bf80, 0, 0, 0, 8a0e9c0) at VOP_FSYNC+0x38       
ffs_sync(0, 3, 1e3bf80, 8a0e9c0, 1093160, 1866a70) at ffs_sync+0xf0
sync_fsync(9237d10, 10fe3ec, 98, 1e3d200, 11318cc, 1c09c80) at sync_fsync+0x6c
VOP_FSYNC(926ba40, 1e3bf80, 8, 0, 0, 8a0e9c0) at VOP_FSYNC+0x38       
sched_sync(180c800, 1808c00, 1806c00, 11b0800, 1863c00, 1821800) at sched_sync+0xf8

and:

trap type 0x34: pc=100a9b8 npc=100a9bc pstate=ffffffff90820006<PRIV,IE>
kernel trap 34: mem address not aligned
Stopped in pid 5.1 (ioflush) at pseg_get+0x3c:  ldxa            [%o2 + %g0] 20, %o2 
db> tr
genfs_putpages(0, 11, 92223c0, 0, ffff0002, 11b0000) at genfs_putpages+0x4f8 
ffs_putpages(92377d0, 1094ba4, 188, 1e3d000, 1863c00, 1821800) at ffs_putpages+0
xdc
VOP_PUTPAGES(9228a00, 0, 0, 11, 0, ffffffffffffbf80) at VOP_PUTPAGES+0x30
ffs_full_fsync(9237a90, 10012, 108, 1e3ce00, ffff0002, 0) at ffs_full_fsync+0xa4
ffs_fsync(9237a90, 10945ac, 98, 1e3d000, 0, ffffffffffffc1d0) at ffs_fsync+0x33
VOP_FSYNC(9228a00, 1e3bf80, 0, 0, 0, 8a0e9c0) at VOP_FSYNC+0x38
ffs_sync(0, 3, 1e3bf80, 8a0e9c0, 1093160, 0) at ffs_sync+0xf0
sync_fsync(9237d10, 10fe3ec, 98, 1e3d200, 10ca2e4, 1c09c80) at sync_fsync+0x6c
VOP_FSYNC(93f2c40, 1e3bf80, 8, 0, 0, 8a0e9c0) at VOP_FSYNC+0x38       
sched_sync(180c800, 1808c00, 1806c00, 11b0800, 1863c00, 1821800) at sched_sync+0xf8

--rafal

----
Rafal Boni                                                     rafal@attbi.com
  We are all worms.  But I do believe I am a glowworm.  -- Winston Churchill