Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
memory corruption [Re: try KMGUARD]
Here's another panic analysis. This one is quite interresting because,
if I got it right, the corruption occurs in the kernel's bss and not
in memory allocated by kmem.
The panic was:
fatal protection fault in supervisor mode
trap type 4 code 0 rip ffffffff8088b610 cs 8 rflags 10202 cr2 ffff80006cbb0000
cpl 0 rsp fffffe810cd96698
kernel: protection fault trap, code=0
Stopped in pid 19880.1 (vnconfig) at netbsd:strcmp+0x70: movb
0(%rdi),%al
db{0}> tr
strcmp() at netbsd:strcmp+0x70
vndioctl() at netbsd:vndioctl+0xd48
bdev_ioctl() at netbsd:bdev_ioctl+0x77
VOP_IOCTL() at netbsd:VOP_IOCTL+0x3b
vn_ioctl() at netbsd:vn_ioctl+0x76
sys_ioctl() at netbsd:sys_ioctl+0x13c
syscall() at netbsd:syscall+0xac
I suspect the strcmp to be in fact in pool_init(). show all pools panic
in the same way, confirming this:
db{0}> show all pools
[...]
POOL buf4k: size 4096, align 8, ioff 0, roflags 0x00000000
alloc 0xffffffff80e203a0
minitems 1, minpages 1, maxpages 1, npages 25
itemsperpage 1, nitems 1, nout 24, hardlimit 4294967295
nget 32, nfail 0, nput 8
npagealloc 33, npagefree 8, hiwat 26, nidle 1
POOLfatal protection fault in supervisor mode
trap type 4 code 0 rip ffffffff80719514 cs 8 rflags 10246 cr2 ffff80006cbb0000
cpl 8 rsp fffffe810cd96060
kernel: protection fault trap, code=0
Faulted in DDB; continuing...
In the list, after buf4k we should have buf512b. Fortunably these
struct pool all comes from bmempools[]. we can easily get the
struct pool's address for buf4k and buf512b from here:
db{0}> sh pool 0xffffffff80e670a0
POOL buf4k: size 4096, align 8, ioff 0, roflags 0x00000000
alloc 0xffffffff80e203a0
minitems 1, minpages 1, maxpages 1, npages 25
itemsperpage 1, nitems 1, nout 24, hardlimit 4294967295
nget 32, nfail 0, nput 8
npagealloc 33, npagefree 8, hiwat 26, nidle 1
db{0}> sh pool 0xffffffff80e66bc0
POOL buf512b: size 512, align 8, ioff 0, roflags 0x00000000
alloc 0xffffffff80e203a0
minitems 1, minpages 1, maxpages 1, npages 1
itemsperpage 8, nitems 5, nout 3, hardlimit 4294967295
nget 4, nfail 0, nput 1
npagealloc 1, npagefree 0, hiwat 1, nidle 0
in struct pool, tqh_first is at offset 0 and tqh_last at offset 8:
db{0}> x/Lx 0xffffffff80e670a0
netbsd:bmempools+0x4e0: ffffffff80e66bc1
this doesn't look good, buf4k's struct pool is corrupted.
db{0}> x/Lx 0xffffffff80e670a8
netbsd:bmempools+0x4e8: ffffffff80e67580
db{0}> sh pool ffffffff80e67580
POOL buf32k: size 32768, align 8, ioff 0, roflags 0x00000000
alloc 0xffffffff80e27320
minitems 1, minpages 1, maxpages 1, npages 1
itemsperpage 2, nitems 2, nout 0, hardlimit 4294967295
nget 0, nfail 0, nput 0
npagealloc 1, npagefree 0, hiwat 1, nidle 1
but this is correct; before buf4k we have buf32k in the list (which
is sorted alphabetically).
db{0}> x/Lx 0xffffffff80e66bc8
netbsd:bmempools+0x8: ffffffff80e670a0
db{0}> sh pool ffffffff80e670a0
POOL buf4k: size 4096, align 8, ioff 0, roflags 0x00000000
alloc 0xffffffff80e203a0
minitems 1, minpages 1, maxpages 1, npages 25
itemsperpage 1, nitems 1, nout 24, hardlimit 4294967295
nget 32, nfail 0, nput 8
npagealloc 33, npagefree 8, hiwat 26, nidle 1
buf512b's previous pointer is also correct.
db{0}> x/Lx 0xffffffff80e66bc0
netbsd:bmempools: ffffffff80e67720
db{0}> sh pool ffffffff80e67720
POOL buf64k: size 65536, align 8, ioff 0, roflags 0x00000000
alloc 0xffffffff80e27320
minitems 1, minpages 1, maxpages 1, npages 1
itemsperpage 1, nitems 1, nout 0, hardlimit 4294967295
nget 0, nfail 0, nput 0
npagealloc 1, npagefree 0, hiwat 1, nidle 1
and buf512b's next pointer is also correct.
So, most likely it's no a bad argument passed to a TAILQ_* macro,
but something really did write to buf4k's struct next pointer. What is
after the next pointer looks correct; lets looks at what is before:
db{0}> sh pool 0xffffffff80e66f00
POOL buf2k: size 2048, align 8, ioff 0, roflags 0x00000000
alloc 0xffffffff80e203a0
minitems 1, minpages 1, maxpages 1, npages 203
itemsperpage 2, nitems 2, nout 404, hardlimit 4294967295
nget 412, nfail 0, nput 8
npagealloc 211, npagefree 8, hiwat 203, nidle 1
db{0}> x/Lx 0xffffffff80e67050
netbsd:bmempools+0x490: 0 #pr_log
db{0}> x/Lx 0xffffffff80e67058
netbsd:bmempools+0x498: 0 #pr_curlogentry
db{0}> x/Lx 0xffffffff80e6705c
netbsd:bmempools+0x49c: 0 #pr_logsize
db{0}> x/Lx 0xffffffff80e67060
netbsd:bmempools+0x4a0: 0 pr_entered_file
db{0}> x/Lx 0xffffffff80e67068
netbsd:bmempools+0x4a8: 0 pr_entered_line
db{0}> x/Lx 0xffffffff80e67070
netbsd:bmempools+0x4b0: 0 pr_reclaimerentry
db{0}> x/Lx 0xffffffff80e67078
netbsd:bmempools+0x4b8: 0
db{0}> x/Lx 0xffffffff80e67080
netbsd:bmempools+0x4c0: 0
db{0}> x/Lx 0xffffffff80e67088
netbsd:bmempools+0x4c8: 0
db{0}> x/Lx 0xffffffff80e67090
netbsd:bmempools+0x4d0: 0 pr_freecheck
db{0}> x/Lx 0xffffffff80e67098
netbsd:bmempools+0x4d8: 0 pr_qcache
they're all 0 as expected. So it looks like a single byte (or maybe
less) has been changed: ffffffff80e66bc8 has been changed to ffffffff80e66bc1.
And this is in the kenrel's bss, not memory managed by kmem or other
kernel memory allocators. So, to me, it looks more like an uninitialized
pointer which is being used somewhere, than use after free.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index