tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Help with issue with mpt(4) driver



On Sat, 26 Jan 2013, Brian Buhrow wrote:

>         Hello.  I believe Patrick may be on to something.  Further
> investigation into my mpt(4) issues reveals that while there are still some
> steps I can take to make the mpt(4) driver more robust when it comes to
> recovering from LSI errors, I believe this particular problem is, strictly
> speaking, outside the mpt(4) driver.  The work I'm doing exaserbates the
> issue, but I can say, with a high degree of confidence, that when I see
> this filesystem lock up state, it's not because the mpt(4) driver lost some
> request.  Rather, I think, requests to the  driver from the filesystem
> layer got reordered and some may have passed some time threshold, from
> which the filesystem layer never recovered.  On other NetBSD systems, I
> often see processes get stuck in uvn_fp2 wait states for long periods of
> time, when, apparently, the machine is doing nothing.  Then, after some
> indeterminate amount of time, some thread somewhere wakes up, notices the
> problem, and things take off again.  I think the trick here is to figure
> out what, exactly, is being waited on, and, once that's done, we'll be able
> to figure out what's really going on.  I suspect that once we find this
> issue, we'll actually solve a number of performance issues which haven't
> been fatal, but which have been troubling a lot of folks in one way or
> another.
>         right now, I have a machine in this file system lockup state, and it
> has a fully symboled debug kernel on it.  Can anyone provide  some examples
> of how I should go about looking at the various locks using gdb?
> Specifically, how do I get a look at the lock that's being blocked in the
> uvn_fp2 state?  I saw some examples from Chuck earlier in this thread, but
> I think that was using ddb.  Some help with gdb would be helpful if someone
> has some script snippets they care to share.

Locks won't help.  They probably aren't being held.  Just dump the state 
of the page structure and look at the flags.  Then go to the containing 
vnode and associated inode and dump the state.  And if we can find an 
associated buf structure as well, that should describe the pendin I/O 
operation.

Eduardo


Home | Main Index | Thread Index | Old Index