NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/43375: panic when mount(8)ing umass(4) device (Sony DSC-H50 Digital Camera)
The following reply was made to PR kern/43375; it has been noted by GNATS.
From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/43375: panic when mount(8)ing umass(4) device (Sony
DSC-H50 Digital Camera)
Date: Mon, 31 May 2010 02:09:15 +0000
On Sat, May 29, 2010 at 07:25:02PM +0000, Alan R. S. Bueno wrote:
> > Do you have that traceback, or can you get one from the dump?
>
> Yes. The system reboots after the panic, so no chance to ddb(4).
>
> Let me know if I'm doing something wrong.
You may be, because the trace doesn't entirely make sense.
> middle-earth# gunzip /var/crash/netbsd.5.core.gz
> middle-earth# crash -M netbsd.5.core
> Crash version 5.99.29, image version 5.99.29.
> System panicked: buf mem pool index %d
> Backtrace from time of crash is available.
This looks fine in principle; however, were you running the same
kernel that generated the dump? If it's not quite the same it would
explain the trace.
If in doubt use the -N /netbsd.whatever option to crash to feed it the
same kernel that generated the core.
> crash> trace
> _KERNEL_OPT_NAPMBIOS(104,0,c0491beb,c09938a7,0,ccafb94c,104,0,0,ccafb94c)
a=
> t 0
> procfs_vnodeop_p(104,0,c09b2180,cc91b2a0,0,2,0,ccafb958,0,0) at 0xcc91b2a0
These don't make sense being here, but might just be junk left on the
stack (that's one of the hazards with ddb...)
procfs_vnodeop_p does not make sense either because it's data.
> panic(c09938a7,17,c23c32d0,0,0,cc5e0678,ccafb98c,c06707f3,cc5e0678,0)
> at 0xc05b4582
> buf_mempoolidx(cc5e0678,0,b0,c23c32d0,0,0,ccafb9bc,c0671b6b,c23c32d0,0)
> at 0xc066fe02
If buf_mempoolidx was really passed 0xcc5e0678, it's no wonder it
paniced. And this would in fact produce "index 23". However, I wonder
about this, for the reasons outlined below... but I don't know why ddb
would get the argument list wrong here, or if it did, why this would
be wrong and not any of the others.
> allocbuf(c23c32d0,0,0,0,cc91b310,cc5e0678,0,ccafba7c,cc5e0678,0) at
0xc06707f3
allocbuf doesn't call buf_mempoolidx, but it does call all three of
the functions that do, two of which aren't called anywhere else, so
let's assume one of those was inlined.
The second argument is the new buffer size, and it's apparently 0.
If passed to buf_mempoolidx, this would cause the same panic, also
with index 23.
However, this isn't consistent with the previous line: while
0xcc5e0678 is in the argument list here allocbuf only actually takes
three arguments, and it looks unlikely that it or any of the inlined
functions could synthesize 0xcc5e0678 to pass through to
buf_mempoolidx. Especially since that that same value appears further
down as the first argument to getblk and bread, which suggests that
it's a vnode.
> getblk(cc5e0678,7a0,0,0,0,0,ccafb9fc,c066fe6e,cc5e0678,cc5e0678) at
0xc0671b6b
This appears to be asking for a buffer of length 0 for block 0x7a0 of
vnode 0xcc5e0678.
>
bio_doread(0,ffffffff,0,c06700aa,c23c33d4,cc5e0678,cb18ce00,f0000,0,c204a600)
> at 0xc0671c7e
This makes no sense at all (block -1, size 0, of a null vnode) but one
needs to go through bio_doread to get to getblk from bread.
> bread(cc5e0678,7a0,0,0,ffffffff,0,ccafba7c,c204a600,ccafba8c,ccb52ccd)
> at 0xc0671e77
This one, however, is consistent with the call to getblk.
> procfs_vnodeop_p(c204a600,ccb59620,0,1,10,2,40,cb2efc00,cc5e0678,cb2efe0b)
> at 0xccb52cf4
>
procfs_vnodeop_p(cc5e0678,ccaf9200,cc91b2a0,ccafd5a0,cc91b2a0,0,3,c05b4cfc,ccaf9afc,2)
> at 0xccb5536e
>
procfs_vnodeop_p(ccaf9200,bfbfe4dc,ccafd5a0,ccafbcc0,0,ccb596e0,ccafbb8c,c067a3b1,c0a57390,ccaf9200)
> at 0xccb5597d
These make no sense whatsoever; procfs_vnodeop_p is not some mystical
internal function inside procfs but procfs's array of vnode op
pointers, which is data. This suggests that the code address is out of
range.
Oh, maybe these are in the msdosfs kernel module and crash doesn't
know how to cope with that? I thought someone had taught ddb about
modules, but maybe it doesn't work for crash.
>
VFS_MOUNT(ccaf9200,bfbfe4dc,ccafd5a0,ccafbcc0,ccaf9b00,ccb21f1c,cca080c0,0,67000,0)
> at 0xc0678604
> do_sys_mount(cc91b2a0,0,80492c5,bfbfe4dc,0,bfbfecdc,0,8c,ccafbd28,bbb30000)
> at 0xc068144c
>
sys___mount50(cc91b2a0,ccafbd00,ccafbd28,ccafbd00,bbb30000,cb19d960,19a,80492c5,bfbfe4dc,0)
> at 0xc06815fd
> syscall(ccafbd48,b3,ab,1f,1f,bfbfe8dc,bfbfe4dc,bfbfed78,bfbfecdc,bbbccbc0)
> at 0xc05c68e9
and that looks all perfectly reasonable.
ok, I think the most likely interpretation (despite the parts that
don't make sense) is that msdosfs asked for a zero-length buffer and
this caused the buffer cache code to panic.
There are two ways to go about debugging this further: one is to use
rump_msdos, which should exhibit the same behavior but being entirely
userlevel can be run in a debugger and also won't kill the system when
it fails.
The other is to go ahead and boot and crash a test kernel; if you
compile msdosfs in instead of loading it as a module, you'll probably
get sane stuff out of crash; alternatively, if you include ddb and set
the ddb.onpanic sysctl to 1, you may be able to get a working
backtrace directly from ddb. Either way, enable DIAGNOSTIC for good
measure.
It may also be helpful to add this:
if (size == 0) {
panic("bread: zero-length buf requested\n");
}
to the top of bread() (at around line 732 of sys/kern/vfs_bio.c),
because if that goes off it will save the trouble of wading through
the buffer code.
--
David A. Holland
dholland%netbsd.org@localhost
Home |
Main Index |
Thread Index |
Old Index