tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: smr(9) and pool_cache_set_smr(9)
> Date: Tue, 19 May 2026 21:10:08 +0000
> From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
>
> 2. We already have a few kernel APIs for algorithms of this type:
> [...]
> But what they all have in common is that waiting for a safe time to
> free is a synchronous blocking operation: pserialize_perform,
> psref_target_destroy, localcount_drain. That's adequate for some
> purposes, but for others, it would be nice to gather moribund
> resources in batches to free asynchronously.
On reflection, I realize this can't be right, because a pool_cache(9)
with PR_PSERIALIZE should already free batches of objects, although
each time it chooses to free a single batch, it holds up the caller to
wait for an xcall.
So the difference is presumably in:
- the cost of read sections (higher from additional barriers and
bookkeeping), vs
- the time from deletion to freeing (perhaps higher because there's no
immediate feedback from xcall), or how much memory can grow due to
batches not yet proven freeable, vs
- the latency of freeing a single batch (maybe lower on average, if we
can find a batch that can be safely freed without synchronously
waiting for an xcall or equivalent? dunno if this makes sense), vs
- the other computational cost of processing each batch (lower because
there's no xcall costing cycles and cache disruption on all CPUs).
I would be curious to see some quantitative visualization of how this
affects practical workloads, e.g. the difference between the same code
using pserialize_read_enter/exit and pool_cache PR_PSERIALIZE vs
smr_(lazy_)enter/exit and pool_cache_set_smr. I'm also curious to see
how smr_enter/exit compares to an ordinary reader/writer lock, because
membar_sync is expensive!
Incidentally, there is probably low-hanging fruit for reducing the
cost of pserialize_perform under load -- the current algorithm is
about as naive as it gets: every pserialize_perform triggers a
broadcast xcall. If one pserialize_perform is in progress waiting for
an xcall when two more pserialize_performs are requested, we could
probably safely serve those requests by a single additional xcall,
rather than additional two xcalls, after the first xcall has
completed.
Home |
Main Index |
Thread Index |
Old Index