tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Help with issue with mpt(4) driver



        See below.


> I have just been experiencing filesystem lock-up with a process in
> uvn_fp2, so it may be unrelated to you mpt fiddling... That systems
> disks are on ahcisata.
>
> It can withstand builds of the world, but not GraphicsMagick:
>
> struct proc *  ffff fe81 4c8c 1d10
> uarea *        ffff fe81 3d6d 5d80
> vmspace/vm_map ffff fe81 0f23 2470
>
> PID   LID S CPU     FLAGS       STRUCT LWP *   NAME WAIT
> 917     4 3   1  10000080   fffffe8147dac180     gm psem
> 917     3 3   2        80   fffffe8147dac5a0     gm psem
> 917     2 3   3  10000080   fffffe8147dac9c0     gm psem
> 917     1 3   4  10000000   fffffe813b57e160     gm uvn_fp2
>
> wmesg psem wchan fffffe81106a1658
>
> lwp 1 fffffe813b57e160 pcb fffffe813d6c9d80
>   stat 3 flags 10000000 cpu 1 pri 43
>   wmesg uvn_fp2 wchan ffff8000003e33f8
>
> ? VNODE flags 0x30<MPSAFE,LOCKSWORK>
> v_lock ffff fe81 1427 0280
>
> but I'm not sure what to look for...
>
> Cheers,
>
> Patrick
>

        Hello.  I believe Patrick may be on to something.  Further
investigation into my mpt(4) issues reveals that while there are still some
steps I can take to make the mpt(4) driver more robust when it comes to
recovering from LSI errors, I believe this particular problem is, strictly
speaking, outside the mpt(4) driver.  The work I'm doing exaserbates the
issue, but I can say, with a high degree of confidence, that when I see
this filesystem lock up state, it's not because the mpt(4) driver lost some
request.  Rather, I think, requests to the  driver from the filesystem
layer got reordered and some may have passed some time threshold, from
which the filesystem layer never recovered.  On other NetBSD systems, I
often see processes get stuck in uvn_fp2 wait states for long periods of
time, when, apparently, the machine is doing nothing.  Then, after some
indeterminate amount of time, some thread somewhere wakes up, notices the
problem, and things take off again.  I think the trick here is to figure
out what, exactly, is being waited on, and, once that's done, we'll be able
to figure out what's really going on.  I suspect that once we find this
issue, we'll actually solve a number of performance issues which haven't
been fatal, but which have been troubling a lot of folks in one way or
another.
        right now, I have a machine in this file system lockup state, and it
has a fully symboled debug kernel on it.  Can anyone provide  some examples
of how I should go about looking at the various locks using gdb?
Specifically, how do I get a look at the lock that's being blocked in the
uvn_fp2 state?  I saw some examples from Chuck earlier in this thread, but
I think that was using ddb.  Some help with gdb would be helpful if someone
has some script snippets they care to share.

-thanks
-Brian




Home | Main Index | Thread Index | Old Index