Re: network outages

To: Michael van Elst <mlelstv%serpens.de@localhost>
Subject: Re: network outages
From: Greg Troxel <gdt%lexort.com@localhost>
Date: Sun, 22 Mar 2026 07:54:50 -0400

Michael van Elst <mlelstv%serpens.de@localhost> writes:

> On Sat, Mar 21, 2026 at 07:00:55PM -0400, Greg Troxel wrote:
>> I could see:
>> 
>>   - If TCP tries to allocate a buffer and it exceeds kern.sbmax, just
>>     clamp it and proceed.
>
> Maybe. But I guess this was just seen as a configuration error, and
> the behaviour is already in the original 4.4BSD code.
>
>>   - When writing sbmax, either don't allow it if it's too small for the
>>     default values of send/recv, or shrink those values to fit (rounded
>>     to 2048?).  (Assume that this does not reduce already allocated
>>     buffers.)
>
> In the original code, that's all compile time values. And with the
> sysctl knobs there is no individual handler routine, you just set
> a variable.
>
> The "default values" obviously do fit. This all just occurs
> when you, as a system adminstrator, tune it and set conflicting
> values. I don't think that's a problem.

I see it as a problem that making a seemingly reasonable setting, on a
machine with plenty of memory, causes it to become broken and perhaps
unreachable, due to a static limit.

I am not arguing for this being interactive, but as a thought
experiment, suppose

  user sets 1 MB default send/recv space, doesn't know or remember about
  sbmax

  code creates TCP connection, runs into 1 MB as allocated doesn't fit
  in sbmax

  [nonreality begins] system asks user: there's a global limit on socket
  buffer space that we can't change right now.  Which would you prefer:

    - a) have the TCP socket creation fail

    - b) have it work, but use the biggest socket buffer allowed by the
      global limit.   Your connection will work, but it won't able  to
      achieve the higher speeds you probably wanted

I would say that aside from perhaps people that are conducting tests,
everyone would choose b.

>> > The correction factor obviously should make kern.sbmax show the
>> > total memory used for a buffer including the headers.  But it's
>> > just confusing.
>> 
>> I don't quite follow.  It seems sbmax is the memory for the mbufs and
>> the clusters, so the amount that one can store in clusters is less.
>
> An mbuf is a header plus a 512 byte (or maybe 256, depending on platform)
> data area. When you allocte a cluster (2048 bytes), the data area is
> left unused. So 1 mbuf with a cluster can store 2048 bytes but uses
> 2048 + 512 bytes. Multiply sbmax with the number of open sockets
> and you know how much precious kernel memory you need. But using
> sbmax as a boundary for tcp_sendspace needs some non-intuitive
> computation.

I thought it was (on vax :-) 256 bytes of which the header was maybe 40
and the rest was available as a buffer, if you don't attach a cluster
(vax, 1K), but that's a nit and maybe I'm off.  (Understood bigger now.)

I did understand from your previous comment that a storage limit of 256K
would lead to smaller usable buffer space.  I didn't think about that
when finding the sendspace limit and thinking it odd.

Are you just pointing out that "bytes available for socket data
(sendspace)" is not the same as "bytes used for storing socket data
(sbmax)"?   And thus that this situation is confusing?

Or are you suggesting bringing them in to a common metric space, or ?

> I also doubt that sbmax was ever seen as a boundary for tcp_sendspace,
> that's just a side effect. It's a stopgap to prevent applications from
> allocating huge socket buffers that exhaust kernel memory.

I can see the stopgap point historically.  256K was a high fraction of
the memory on a vax (1M to 32M, absent the late-stage monsters, iirc).
But we are in a world where NetBSD can run on 128M to 512G and picking a
fixed limit is tricky.  I think it makes sense to make it a clamping
limit instead of a rejection limit.

Are there cases where people really want rejection, other than explicit
testing?  (Specifically, where write/read/check doesn't make sense as a
way to get pseudo-rejection, when you're trying to push things on
purpose.)

I wonder how much the surprisingly low sbmax explains my long-term
perception that NetBSD TCP is slow and that auto buf sizing doesn't
really work.

Follow-Ups:
- Re: network outages
  - From: Michael van Elst

References:
- network outages
  - From: Thomas Klausner
- Re: network outages
  - From: Thomas Klausner
- Re: network outages
  - From: Thomas Klausner
- Re: network outages
  - From: Greg Troxel
- Re: network outages
  - From: Michael van Elst
- Re: network outages
  - From: Greg Troxel
- Re: network outages
  - From: Michael van Elst

Prev by Date: Re: network outages
Next by Date: Re: network outages
Previous by Thread: Re: network outages
Next by Thread: Re: network outages
Indexes:

Home | Main Index | Thread Index | Old Index