Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

pool_grow hangs (Re: CVS commit: src/sys)



On Sat, 16 Dec 2017 03:13:29 +0000
matthew green <mrg%netbsd.org@localhost> wrote:

> Module Name:	src
> Committed By:	mrg
> Date:		Sat Dec 16 03:13:29 UTC 2017
> 
> Modified Files:
> 	src/sys/kern: subr_pool.c
> 	src/sys/sys: pool.h
> 
> Log Message:
> hopefully workaround the irregularly "fork fails in init" problem.
> 
> if a pool is growing, and the grower is PR_NOWAIT, mark this.
> if another caller wants to grow the pool and is also PR_NOWAIT,
> busy-wait for the original caller, which should either succeed
> or hard-fail fairly quickly.
> 
> implement the busy-wait by unlocking and relocking this pools
> mutex and returning ERESTART.  other methods (such as having
> the caller do this) were significantly more code and this hack
> is fairly localised.
> 
> ok chs@ riastradh@

Hi!

I have an easily reproducable system hang that I believe originates
from this change. It can be triggered by doing lots of block and
network i/o (like 3 multiple rsyncs) on a uniprocessor system
running under Linux KVM.

Basically what happens is that for unknown reasons the PR_NOWAIT
grower blocks forever when it tries to reaquire the pool lock to
do pool_prime_page() in pool_grow().

Meanwhile another process, waiting for the grower to finish, is
spinning forever at 100% doing the mutex_exit/mutex_enter/ERESTART
thing on the same pool. It looks to me like the grower never actually
gets scheduled to run.

Also, although it doesn't fix the issue, this pr_flags modification
looks like it should be moved to after the mutex is acquired:

        pp->pr_flags &= ~(PR_GROWING|PR_GROWINGNOWAIT);
        mutex_enter(&pp->pr_lock);

Kind regards,
-Tobias


Home | Main Index | Thread Index | Old Index