tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Possible buffer cache race?



On Sun, Oct 23, 2016 at 06:27:09PM +0200, Jarom?r Dole?ek wrote:
 > I have the filesystem mounted async and the machine has huge amount of
 > RAM, without logging at the moment. So it's mostly buffer cache
 > exercise, with i/o spikes on sync.
 > 
 > I see interesting thing - periodically, all of the tar processes get
 > blocked sleeping on either tstile, biolock or pager_map. All the tar
 > processes block. When I just wait they stay blocked. When I call
 > sync(8), all of them unblock and continue running, until again they
 > all hit the same condition later.  When I keep calling sync,
 > eventually all processes finish.

Is this correlated with the syncer running? I have been seeing a
problem where every time the syncer runs it locks out everything else,
and once that happens some things take several seconds to complete
afterwards.

I haven't had time to look into it at all, but it's rather a serious
problem and it seems like these might actually be the same - I guess
it would be because the syncer unleashes a flood of previously stuck
requests that choke everything up. Or maybe the syncer is ultimately
causing your problem as well even though you're mounting async.

Can you get a process to block at the end of your test workload (or
stop your test workload with stuff blocked) and examine it with crash?
The tstile ones are unlikely to be interesting (if you have a bunch of
tstiles, pick one that isn't, which'll be what they're waiting behind)
but the others might be. Knowing where it gets stuck will help a lot;
or if it doesn't get stuck in any one place that too.

The most common cause of missed wakeups is not going to sleep
atomically, so it's usually a property of a particular sleep site.
Next most common is races between changing things affecting the sleep
condition and posting a wakeup; in particular, unlocked wakeup before
change (or unlocked wakeup after change without a memory barrier to
enforce the ordering) leads to problems.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index