tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: continued zfs-related lockups
Following up on zfs-related lockups from netbsd-users@:
It's not just me. Multiple people have posted in previous replies
that they see this.
I see processes in flt_noram5 and they persistently remain there after
RAM becomes available.
I see processes in what I think is zilog (zfs intent log?). (ps and
ddb unhelpfully truncate these fields).
I see processes in zio_buf_? (again unhelpfully truncated).
There are some processes in tstile, but that just means they are
waiting for the same thing something else is waiting for, as I
understand it.
After reducing maxvnodes from 1600000 (default value on 32GB system)
to 500000 lockups are less frequent.
lockups are provoked by programs, in zfs, doing:
- reading large amuonts of data quickly
- deleting large numbers of files quickly
I think therefore we have multiple problems:
- zfs operations should block userland if resources are over threshold
(or more than X% over, if there is some background cleanup intended
to usually work without blocking)
- there is a bug with missing wakeups or some other locking problem
under memory pressure, that somehow only happens with zfs or pools.
(I'm saying pools because zfs allocates massive amounts of pool
storage, and that typically does not happen on non-zfs systems.)
Questions:
- Is anyone seeing lockups on netbsd-10, other than that they think
they can pin on flaky hardware or accessing an odd device?
- Is there a way in ddb to issue a wakeup on flt_noram5?
- If I wanted to change the kernel to every so often (30s?) issue a
wakeup to flt_noram5, where/how should I do this? Or, should there
be a once/second that goes to the next process and wakes it up, as a
debug option? Or, why I am wrong to want to do this?
- Somehow, processes waiting on pools do not get woken up when
presumably the pool code was waiting on RAM, and RAM becomes
available. Or at least it seems that way. How is this supposed to
work?
- My belief is that even if zfs is piggy, the system should not lock
up, and that absent bugs I would be complaining "zfs piggyness leads
to paging out stuff and making the system slow" instead. Correct?
Home |
Main Index |
Thread Index |
Old Index