Re: continued zfs-related lockups

To: Greg Troxel <gdt%lexort.com@localhost>
Subject: Re: continued zfs-related lockups
From: Chuck Silvers <chuq%chuq.com@localhost>
Date: Sun, 8 Dec 2024 09:22:33 -0800

On Thu, Oct 24, 2024 at 09:42:55AM -0400, Greg Troxel wrote:
>   I see processes in flt_noram5 and they persistently remain there after
>   RAM becomes available.

"flt_noram5" means "wait until the pagedaemon signals that it has finished
a cycle of trying to free pages".  when threads stay stuck here even after
pages have been freed then that usually means the pagedaemon is hung in
a locking deadlock.  what is the stack trace of the pagedaemon thread
in your hangs?

>   - Is there a way in ddb to issue a wakeup on flt_noram5?

you could do the ddb equivalent of "wakeup(&uvmexp.free)",
ie. "call wakeup(ADDR)" where ADDR is the value of "&uvmexp.free".

>   - If I wanted to change the kernel to every so often (30s?) issue a
>     wakeup to flt_noram5, where/how should I do this?  Or, should there
>     be a once/second that goes to the next process and wakes it up, as a
>     debug option?  Or, why I am wrong to want to do this?

there's no "next process", the pagedaemon always wakes up every thread
that has gone to sleep waiting for the pagedaemon to make some progress.
you could use a periodic wakeup as a debugging tool, sure.  but it's
usually enough to check the stack trace of the pagedaemon thread
to see if the problem is that the pagedaemon thread is hung.

>   - Somehow, processes waiting on pools do not get woken up when
>     presumably the pool code was waiting on RAM, and RAM becomes
>     available.  Or at least it seems that way.  How is this supposed to
>     work?

the pagedaemon thread isn't supposed to get stuck in locking deadlocks.  :-)

>   - My belief is that even if zfs is piggy, the system should not lock
>     up, and that absent bugs I would be complaining "zfs piggyness leads
>     to paging out stuff and making the system slow" instead.  Correct?

yes, that is correct.

-Chuck

References:
- Re: continued zfs-related lockups
  - From: Greg Troxel

Prev by Date: CSan race in cpu_count_sync() coming from ZFS
Next by Date: patch review: pvbus
Previous by Thread: Re: continued zfs-related lockups
Next by Thread: Re: continued zfs-related lockups
Indexes:

Home | Main Index | Thread Index | Old Index