Re: Serious WAPL performance problems

To: Mindaugas Rasiukevicius <rmind%NetBSD.org@localhost>, David Holland <dholland-tech%NetBSD.org@localhost>
Subject: Re: Serious WAPL performance problems
From: Brian Buhrow <buhrow%nfbcal.org@localhost>
Date: Tue, 23 Oct 2012 16:38:03 -0700

On Oct 24, 12:07am, Mindaugas Rasiukevicius wrote:
} Subject: Re: Serious WAPL performance problems
} David Holland <dholland-tech%netbsd.org@localhost> wrote:
} > On Tue, Oct 23, 2012 at 07:53:28PM +0200, Edgar Fu? wrote:
} >  > > the output of ps -lax on the NFS server during the 18-20 second
} >  > > window
} >  >
} >  > As far as I remember (you need the s option, too), the main nfsd
} >  > thread is on select, one subthread on biowait or biolock and the
} >  > others on tstile.
} > 
} > It would really be nice to know what those others are waiting behind.
} > 
} > paging rmind...
} > 
} > 
} > Explanation to those following along at home: rmind has been claiming
} > for years that there is no need to have real wchan names instead of
} > "tstile", which just means "I'm waiting for something". He claims it's
} > easy to diagnose problems without that information. So when one comes
} > up it's time for him to prove it. :-)
} > 
} 
} "Easy to diagnose problems"?  Plain false.  The lock naming you are talking
} about would give no *more* information than "it is a vnode lock", and one
} can guess already that it is most likely the case here (what a surprise!).
} To diagnose the problem, one needs quite more information than your useless
} lock naming would provide: for example, in almost any case, backtraces of
} the LWPs are required to figure out what is going on.  Unless the case is
} very obvious/simplistic, lock naming will not explain the deadlock without
} the backtraces.  I have said this multiple times.  If you are unable to see
} the difference between the statements, then I cannot help you.  Nor I am
} interested in discussion with somebody who does not bother to listen.

        Hello.  I think you two are talking past each other.  While it's true
that having a lock name isn't necessarily enough information to diagnose a
problem, it's a lot better than having nothing.  I've worked on systems
where all you could get was an address of a lock, which was different on
every system, and as a result, it was nearly impossible to diagnose
issues in the field at all.  With lock names, you can search through the
source code and find where  those locks are taken, and, potentially, where
they're released.  Recently I found a problem with the ahc(4) driver where
it issues a command to a controller and goes to sleep waiting for a
response.  If the controller goes out to lunch and never answers the call,
the driver gets stuck forever in that spot.  With the lock name, I was
quickly able to determine what was wrong and, potentially, fix the issue.
If I'd only had an address, I'd still be scratching my head about the
issue.  I agree with rmind that names aren't always useful, but I'd sure
rather have them than not.  And, the more descriptive and unique they are
in the source tree, the more useful they are.
-Brian

Follow-Ups:
- Re: Serious WAPL performance problems
  - From: Mindaugas Rasiukevicius

References:
- Re: Serious WAPL performance problems
  - From: Mindaugas Rasiukevicius

Prev by Date: Re: Serious WAPL performance problems
Next by Date: Re: Serious WAPL performance problems
Previous by Thread: Re: Serious WAPL performance problems
Next by Thread: Re: Serious WAPL performance problems
Indexes:

Home | Main Index | Thread Index | Old Index