Re: smr(9) and pool_cache_set_smr(9)

To: Kevin Bowling <kevin.bowling%kev009.com@localhost>
Subject: Re: smr(9) and pool_cache_set_smr(9)
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Wed, 20 May 2026 15:01:43 +0000

> Date: Tue, 19 May 2026 21:10:08 +0000
> From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
> 
> 2. We already have a few kernel APIs for algorithms of this type:
> [...]
>    But what they all have in common is that waiting for a safe time to
>    free is a synchronous blocking operation: pserialize_perform,
>    psref_target_destroy, localcount_drain.  That's adequate for some
>    purposes, but for others, it would be nice to gather moribund
>    resources in batches to free asynchronously.

On reflection, I realize this can't be right, because a pool_cache(9)
with PR_PSERIALIZE should already free batches of objects, although
each time it chooses to free a single batch, it holds up the caller to
wait for an xcall.

So the difference is presumably in:

- the cost of read sections (higher from additional barriers and
  bookkeeping), vs

- the time from deletion to freeing (perhaps higher because there's no
  immediate feedback from xcall), or how much memory can grow due to
  batches not yet proven freeable, vs

- the latency of freeing a single batch (maybe lower on average, if we
  can find a batch that can be safely freed without synchronously
  waiting for an xcall or equivalent? dunno if this makes sense), vs

- the other computational cost of processing each batch (lower because
  there's no xcall costing cycles and cache disruption on all CPUs).

I would be curious to see some quantitative visualization of how this
affects practical workloads, e.g. the difference between the same code
using pserialize_read_enter/exit and pool_cache PR_PSERIALIZE vs
smr_(lazy_)enter/exit and pool_cache_set_smr.  I'm also curious to see
how smr_enter/exit compares to an ordinary reader/writer lock, because
membar_sync is expensive!

Incidentally, there is probably low-hanging fruit for reducing the
cost of pserialize_perform under load -- the current algorithm is
about as naive as it gets: every pserialize_perform triggers a
broadcast xcall.  If one pserialize_perform is in progress waiting for
an xcall when two more pserialize_performs are requested, we could
probably safely serve those requests by a single additional xcall,
rather than additional two xcalls, after the first xcall has
completed.

References:
- Re: smr(9) and pool_cache_set_smr(9)
  - From: Taylor R Campbell

Prev by Date: Re: smr(9) and pool_cache_set_smr(9)
Next by Date: bad144 - how useful still? Because it's annoyingly ubiquitous.
Previous by Thread: Re: smr(9) and pool_cache_set_smr(9)
Next by Thread: Re: [PATCH] Add posix_spawn_file_actions_addclosefrom_np
Indexes:

Home | Main Index | Thread Index | Old Index