tech-kern: Re: splx() optimization [was Re: SMP re-eetrancy in "bottom half"

Subject: Re: splx() optimization [was Re: SMP re-eetrancy in "bottom half"
To: None <jonathan@dsg.stanford.edu>
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
List: tech-kern
Date: 06/07/2005 18:35:39

> >tackle the highest IPL first.
> 
> Oh. I see. I had not grasped that part before.
> 
> That's certainly an internally-consistent strategy, but not at all
> where my interest and focus lies (i.e., networking code). And it
> quantizes improvements at the granuarity of *all* code at a given IPL.

actually, you can do per-driver improvements for the highest IPL.

> Whereas the lock-per-IPL lets us proceed as in your patch from 2003.

my patch doesn't imply lock-per-IPL.

you can start to tackle your IPL_NET driver today.
however, you can't remove biglock for IPL_NET today.

we already have many spinlocks which are always grabbed within biglock.
(thus no actual contention.)   i don't think it's a problem to add
another spinlock for your driver.

you can even test your driver by flattening IPLs or introducing
lock-per-IPL *in your local tree*.

what's a problem?

> The hierarchy of IPL-bound locks lets us do that without flattening
> all device interrupts/IPLs to a single level (which is the part I
> think you described to Stefan as the "hack" part of that patch. ).

lock-per-IPL is a far worse hack than flattening IPLs, IMO.

> I'm also skeptical that we have developer resources to add explicit
> SMP locking across such large volumes of code as the "unit of least
> effort" or "least commit" whilst still maintaining the quality for
> which NetBSD is known.

you don't need to do so large unit at once.
i don't think the "least commit" size is different from
the case of lock-per-IPL.

> Yamamoto-san, I envisage little or *no* changes to individual drivers
> in going from the approach I outline, to the one you suggest.
> If that's really so, where's the downside?

splxxx() functions are used for cpu-local synchronization and are expected
to be fast.  it's the fundamental semantics of them.
tlb shootdown code is a good example where the cpu-local behaviour is
appropriate.

it's ironic that this "making splxxx functions terribly slower" thing is
discussed with the subject "splx() optimization". :-(

YAMAMOTO Takashi