NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/43375: panic when mount(8)ing umass(4) device (Sony DSC-H50 Digital Camera)



The following reply was made to PR kern/43375; it has been noted by GNATS.

From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/43375: panic when mount(8)ing umass(4) device (Sony
        DSC-H50 Digital Camera)
Date: Mon, 31 May 2010 02:09:15 +0000

 On Sat, May 29, 2010 at 07:25:02PM +0000, Alan R. S. Bueno wrote:
  >  > Do you have that traceback, or can you get one from the dump?
  >  
  >  Yes. The system reboots after the panic, so no chance to ddb(4).
  >  
  >  Let me know if I'm doing something wrong.
 
 You may be, because the trace doesn't entirely make sense.
 
  >  middle-earth# gunzip /var/crash/netbsd.5.core.gz
  >  middle-earth# crash -M netbsd.5.core
  >  Crash version 5.99.29, image version 5.99.29.
  >  System panicked: buf mem pool index %d
  >  Backtrace from time of crash is available.
 
 This looks fine in principle; however, were you running the same
 kernel that generated the dump? If it's not quite the same it would
 explain the trace.
 
 If in doubt use the -N /netbsd.whatever option to crash to feed it the
 same kernel that generated the core.
 
  >  crash> trace
  >  _KERNEL_OPT_NAPMBIOS(104,0,c0491beb,c09938a7,0,ccafb94c,104,0,0,ccafb94c) 
a=
  >  t 0
  >  procfs_vnodeop_p(104,0,c09b2180,cc91b2a0,0,2,0,ccafb958,0,0) at 0xcc91b2a0
 
 These don't make sense being here, but might just be junk left on the
 stack (that's one of the hazards with ddb...)
 
 procfs_vnodeop_p does not make sense either because it's data.
 
  >  panic(c09938a7,17,c23c32d0,0,0,cc5e0678,ccafb98c,c06707f3,cc5e0678,0)
  >  at 0xc05b4582
  >  buf_mempoolidx(cc5e0678,0,b0,c23c32d0,0,0,ccafb9bc,c0671b6b,c23c32d0,0)
  >  at 0xc066fe02
 
 If buf_mempoolidx was really passed 0xcc5e0678, it's no wonder it
 paniced. And this would in fact produce "index 23". However, I wonder
 about this, for the reasons outlined below... but I don't know why ddb
 would get the argument list wrong here, or if it did, why this would
 be wrong and not any of the others.
 
  >  allocbuf(c23c32d0,0,0,0,cc91b310,cc5e0678,0,ccafba7c,cc5e0678,0) at 
0xc06707f3
 
 allocbuf doesn't call buf_mempoolidx, but it does call all three of
 the functions that do, two of which aren't called anywhere else, so
 let's assume one of those was inlined.
 
 The second argument is the new buffer size, and it's apparently 0.
 If passed to buf_mempoolidx, this would cause the same panic, also
 with index 23.
 
 However, this isn't consistent with the previous line: while
 0xcc5e0678 is in the argument list here allocbuf only actually takes
 three arguments, and it looks unlikely that it or any of the inlined
 functions could synthesize 0xcc5e0678 to pass through to
 buf_mempoolidx. Especially since that that same value appears further
 down as the first argument to getblk and bread, which suggests that
 it's a vnode.
 
  >  getblk(cc5e0678,7a0,0,0,0,0,ccafb9fc,c066fe6e,cc5e0678,cc5e0678) at 
0xc0671b6b
 
 This appears to be asking for a buffer of length 0 for block 0x7a0 of
 vnode 0xcc5e0678.
 
  >  
bio_doread(0,ffffffff,0,c06700aa,c23c33d4,cc5e0678,cb18ce00,f0000,0,c204a600)
  >  at 0xc0671c7e
 
 This makes no sense at all (block -1, size 0, of a null vnode) but one
 needs to go through bio_doread to get to getblk from bread.
 
  >  bread(cc5e0678,7a0,0,0,ffffffff,0,ccafba7c,c204a600,ccafba8c,ccb52ccd)
  >  at 0xc0671e77
 
 This one, however, is consistent with the call to getblk.
 
  >  procfs_vnodeop_p(c204a600,ccb59620,0,1,10,2,40,cb2efc00,cc5e0678,cb2efe0b)
  >  at 0xccb52cf4
  >  
procfs_vnodeop_p(cc5e0678,ccaf9200,cc91b2a0,ccafd5a0,cc91b2a0,0,3,c05b4cfc,ccaf9afc,2)
  >  at 0xccb5536e
  >  
procfs_vnodeop_p(ccaf9200,bfbfe4dc,ccafd5a0,ccafbcc0,0,ccb596e0,ccafbb8c,c067a3b1,c0a57390,ccaf9200)
  >  at 0xccb5597d
 
 These make no sense whatsoever; procfs_vnodeop_p is not some mystical
 internal function inside procfs but procfs's array of vnode op
 pointers, which is data. This suggests that the code address is out of
 range.
 
 Oh, maybe these are in the msdosfs kernel module and crash doesn't
 know how to cope with that? I thought someone had taught ddb about
 modules, but maybe it doesn't work for crash.
 
  >  
VFS_MOUNT(ccaf9200,bfbfe4dc,ccafd5a0,ccafbcc0,ccaf9b00,ccb21f1c,cca080c0,0,67000,0)
  >  at 0xc0678604
  >  do_sys_mount(cc91b2a0,0,80492c5,bfbfe4dc,0,bfbfecdc,0,8c,ccafbd28,bbb30000)
  >  at 0xc068144c
  >  
sys___mount50(cc91b2a0,ccafbd00,ccafbd28,ccafbd00,bbb30000,cb19d960,19a,80492c5,bfbfe4dc,0)
  >  at 0xc06815fd
  >  syscall(ccafbd48,b3,ab,1f,1f,bfbfe8dc,bfbfe4dc,bfbfed78,bfbfecdc,bbbccbc0)
  >  at 0xc05c68e9
 
 and that looks all perfectly reasonable.
 
 ok, I think the most likely interpretation (despite the parts that
 don't make sense) is that msdosfs asked for a zero-length buffer and
 this caused the buffer cache code to panic.
 
 There are two ways to go about debugging this further: one is to use
 rump_msdos, which should exhibit the same behavior but being entirely
 userlevel can be run in a debugger and also won't kill the system when
 it fails.
 
 The other is to go ahead and boot and crash a test kernel; if you
 compile msdosfs in instead of loading it as a module, you'll probably
 get sane stuff out of crash; alternatively, if you include ddb and set
 the ddb.onpanic sysctl to 1, you may be able to get a working
 backtrace directly from ddb. Either way, enable DIAGNOSTIC for good
 measure.
 
 It may also be helpful to add this:
 
        if (size == 0) {
                panic("bread: zero-length buf requested\n");
        }
 
 to the top of bread() (at around line 732 of sys/kern/vfs_bio.c),
 because if that goes off it will save the trouble of wading through
 the buffer code.
 
 -- 
 David A. Holland
 dholland%netbsd.org@localhost
 


Home | Main Index | Thread Index | Old Index