Re: kern/43375: panic when mount(8)ing umass(4) device (Sony DSC-H50 Digital Camera)

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,alan.bsd%gmail.com@localhost
Subject: Re: kern/43375: panic when mount(8)ing umass(4) device (Sony DSC-H50 Digital Camera)
From: David Holland <dholland-bugs%netbsd.org@localhost>
Date: Mon, 31 May 2010 02:10:05 +0000 (UTC)

The following reply was made to PR kern/43375; it has been noted by GNATS.

From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/43375: panic when mount(8)ing umass(4) device (Sony
        DSC-H50 Digital Camera)
Date: Mon, 31 May 2010 02:09:15 +0000

 On Sat, May 29, 2010 at 07:25:02PM +0000, Alan R. S. Bueno wrote:
  >  > Do you have that traceback, or can you get one from the dump?
  >  
  >  Yes. The system reboots after the panic, so no chance to ddb(4).
  >  
  >  Let me know if I'm doing something wrong.

 You may be, because the trace doesn't entirely make sense.

  >  middle-earth# gunzip /var/crash/netbsd.5.core.gz
  >  middle-earth# crash -M netbsd.5.core
  >  Crash version 5.99.29, image version 5.99.29.
  >  System panicked: buf mem pool index %d
  >  Backtrace from time of crash is available.

 This looks fine in principle; however, were you running the same
 kernel that generated the dump? If it's not quite the same it would
 explain the trace.

 If in doubt use the -N /netbsd.whatever option to crash to feed it the
 same kernel that generated the core.

  >  crash> trace
  >  _KERNEL_OPT_NAPMBIOS(104,0,c0491beb,c09938a7,0,ccafb94c,104,0,0,ccafb94c) 
a=
  >  t 0
  >  procfs_vnodeop_p(104,0,c09b2180,cc91b2a0,0,2,0,ccafb958,0,0) at 0xcc91b2a0

 These don't make sense being here, but might just be junk left on the
 stack (that's one of the hazards with ddb...)

 procfs_vnodeop_p does not make sense either because it's data.

  >  panic(c09938a7,17,c23c32d0,0,0,cc5e0678,ccafb98c,c06707f3,cc5e0678,0)
  >  at 0xc05b4582
  >  buf_mempoolidx(cc5e0678,0,b0,c23c32d0,0,0,ccafb9bc,c0671b6b,c23c32d0,0)
  >  at 0xc066fe02

 If buf_mempoolidx was really passed 0xcc5e0678, it's no wonder it
 paniced. And this would in fact produce "index 23". However, I wonder
 about this, for the reasons outlined below... but I don't know why ddb
 would get the argument list wrong here, or if it did, why this would
 be wrong and not any of the others.

  >  allocbuf(c23c32d0,0,0,0,cc91b310,cc5e0678,0,ccafba7c,cc5e0678,0) at 
0xc06707f3

 allocbuf doesn't call buf_mempoolidx, but it does call all three of
 the functions that do, two of which aren't called anywhere else, so
 let's assume one of those was inlined.

 The second argument is the new buffer size, and it's apparently 0.
 If passed to buf_mempoolidx, this would cause the same panic, also
 with index 23.

 However, this isn't consistent with the previous line: while
 0xcc5e0678 is in the argument list here allocbuf only actually takes
 three arguments, and it looks unlikely that it or any of the inlined
 functions could synthesize 0xcc5e0678 to pass through to
 buf_mempoolidx. Especially since that that same value appears further
 down as the first argument to getblk and bread, which suggests that
 it's a vnode.

  >  getblk(cc5e0678,7a0,0,0,0,0,ccafb9fc,c066fe6e,cc5e0678,cc5e0678) at 
0xc0671b6b

 This appears to be asking for a buffer of length 0 for block 0x7a0 of
 vnode 0xcc5e0678.

  >  
bio_doread(0,ffffffff,0,c06700aa,c23c33d4,cc5e0678,cb18ce00,f0000,0,c204a600)
  >  at 0xc0671c7e

 This makes no sense at all (block -1, size 0, of a null vnode) but one
 needs to go through bio_doread to get to getblk from bread.

  >  bread(cc5e0678,7a0,0,0,ffffffff,0,ccafba7c,c204a600,ccafba8c,ccb52ccd)
  >  at 0xc0671e77

 This one, however, is consistent with the call to getblk.

  >  procfs_vnodeop_p(c204a600,ccb59620,0,1,10,2,40,cb2efc00,cc5e0678,cb2efe0b)
  >  at 0xccb52cf4
  >  
procfs_vnodeop_p(cc5e0678,ccaf9200,cc91b2a0,ccafd5a0,cc91b2a0,0,3,c05b4cfc,ccaf9afc,2)
  >  at 0xccb5536e
  >  
procfs_vnodeop_p(ccaf9200,bfbfe4dc,ccafd5a0,ccafbcc0,0,ccb596e0,ccafbb8c,c067a3b1,c0a57390,ccaf9200)
  >  at 0xccb5597d

 These make no sense whatsoever; procfs_vnodeop_p is not some mystical
 internal function inside procfs but procfs's array of vnode op
 pointers, which is data. This suggests that the code address is out of
 range.

 Oh, maybe these are in the msdosfs kernel module and crash doesn't
 know how to cope with that? I thought someone had taught ddb about
 modules, but maybe it doesn't work for crash.

  >  
VFS_MOUNT(ccaf9200,bfbfe4dc,ccafd5a0,ccafbcc0,ccaf9b00,ccb21f1c,cca080c0,0,67000,0)
  >  at 0xc0678604
  >  do_sys_mount(cc91b2a0,0,80492c5,bfbfe4dc,0,bfbfecdc,0,8c,ccafbd28,bbb30000)
  >  at 0xc068144c
  >  
sys___mount50(cc91b2a0,ccafbd00,ccafbd28,ccafbd00,bbb30000,cb19d960,19a,80492c5,bfbfe4dc,0)
  >  at 0xc06815fd
  >  syscall(ccafbd48,b3,ab,1f,1f,bfbfe8dc,bfbfe4dc,bfbfed78,bfbfecdc,bbbccbc0)
  >  at 0xc05c68e9

 and that looks all perfectly reasonable.

 ok, I think the most likely interpretation (despite the parts that
 don't make sense) is that msdosfs asked for a zero-length buffer and
 this caused the buffer cache code to panic.

 There are two ways to go about debugging this further: one is to use
 rump_msdos, which should exhibit the same behavior but being entirely
 userlevel can be run in a debugger and also won't kill the system when
 it fails.

 The other is to go ahead and boot and crash a test kernel; if you
 compile msdosfs in instead of loading it as a module, you'll probably
 get sane stuff out of crash; alternatively, if you include ddb and set
 the ddb.onpanic sysctl to 1, you may be able to get a working
 backtrace directly from ddb. Either way, enable DIAGNOSTIC for good
 measure.

 It may also be helpful to add this:

        if (size == 0) {
                panic("bread: zero-length buf requested\n");
        }

 to the top of bread() (at around line 732 of sys/kern/vfs_bio.c),
 because if that goes off it will save the trouble of wading through
 the buffer code.

 -- 
 David A. Holland
 dholland%netbsd.org@localhost

Prev by Date: Re: toolchain/37504 (build.sh needs /usr/src/obj to exist for consistent usage on default values of DESTDIR, TOOLDIR and so on)
Next by Date: Re: kern/43375: panic when mount(8)ing umass(4) device (Sony DSC-H50 Digital Camera)
Previous by Thread: Re: kern/43375: panic when mount(8)ing umass(4) device (Sony DSC-H50 Digital Camera)
Next by Thread: Re: kern/43375: panic when mount(8)ing umass(4) device (Sony DSC-H50 Digital Camera)
Indexes:

Home | Main Index | Thread Index | Old Index