Subject: Re: Increasing SHMMAXPGS
To: Curt Sampson <cjs@cynic.net>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 07/07/2002 10:18:06
On Fri, Jul 05, 2002 at 04:28:24PM +0900, Curt Sampson wrote:
> So I just had a look at this, and read the old thread by this name on
> tech-kern, and here's sort of a summary:
>
> 1. Removing segments whose creator has died/crashed/whatever should
> be an option that defaults to "off" in all cases. Systems such as
> PostgresSQL actually use information from old segment to help clean up
> after a crash.

if there really is a need for such an option, I'd agree it should certainly
default to "off".


> 2. There's no reason for SHMMAXPGS at all under the new VM system; these
> are just normal memory pages backed with an anonymous pager, and use up
> resources in the same way.

there does need to be *some* limit on the amount of memory allocatable
in SHM segments.  it probably ought to be based on the amount of RAM+swap
available at the time shmat() is called rather than a compile-time constant,
though.


> 3. We probably do want some sort of limit on how many shm pages a
> process can allocate in total, for the same reason we want that limit
> for anything else. I note (via experimentation) that mmap applies the
> datasize limit to anonymous memory (i.e., backed by swap, not a file),
> though not to file maps. So it seems reasonable, since shared memory is
> basically exactly the same thing (except that other processes can attach
> these "anonymous" regions), that we should apply the same limit.

since SHM segments actually do have names and they exist independently
of mappings in any process, I'll argue that they're really more like
files than anonymous memory.  SHM segments also have owners, permissions
timestamps, etc.  the SHM namespace is basically a virtual-memory-backed
file system with a flat namespace, so it would really make more sense
to think of it like a file system.


> 4. The tricky part here is that shared memory persists after a process
> is gone. If an administrator finds his box running out of memory, he
> might go and kill some memory-hogging processes and discover that he's
> still out of memory, and not think to do an ipcs to find out how much of
> that is "dead" SysV shared memory.
> 
> The best way I can think of to deal with this is to copy FreeBSD's
> kern.ipc.shmmax sysctl (which is a global maximum for sysv shared memory
> segments), and set that to some reasonable value, say, half of RAM,
> and also (if it's not there already) account segment creation towards
> the process's datasize limit. That way admins who aren't shared memory
> clueful will not have too much damage done to their systems, and those
> who are can crank up the control (or set it to -1, meaning no limit) and
> get all the shared memory they need. But this seems to me still slightly
> kludgy, so I'm open to other suggestions.

the global limit is fine, it limits the size of the SHM "file system".
there are already various global limits:

# ipcs -M
shminfo:
        shmmax: 8388608 (max shared memory segment size)
        shmmin:       1 (min shared memory segment size)
        shmmni:     128 (max number of shared memory identifiers)
        shmseg:     128 (max shared memory segments per process)
        shmall:    2048 (max amount of shared memory in pages)

making these adjustable via sysctl would be good.

probably the "shmall" limit should default to 0, which would just
leave the RAM+swap-at-the-moment check.  same for "shmmax".

trying to apply a per-process limit to SHM creation doesn't make a lot of
sense, since a program could trivially work around that by forking and
having its children create whatever segments it wants, then just attach
to those already-existing segments.  processes are really just mapping
these SHM "files".


> 5. Some programs, it appears to me, don't expected shared memory to be
> pagable. A particular example would be database programs, which use
> shared memory for buffering blocks from disk. Needless to say, the naive
> user who cranks up his shared memory segments to be most of system RAM
> in the hope of improving his Oracle buffering performance is in for a
> bit of a surprise!
> 
> FreeBSD solves this by adding a kern.ipc.shm_use_phys sysctl which makes
> all shm segments allocated from that point on non-pagable. I can't say
> I particularly like this solution, but it works, and I can't think
> of anything else better enough to justify not being compatable with
> FreeBSD. Presumably it would account towards the RLIMIT_MEMLOCK limits.
> I could take a look at what would be involved in implementing this, if
> anyone is interested.

if you really want to do this, I'd suggest having that flag just trigger an
implicit mlock() after a segment is attached.  that should do exactly
what you'd want, including enforcing limits on locked memory.

-Chuck


> Anyway, thoughts? If I just went ahead and did some or all of this,
> would anyone object?
> 
> Note that this doesn't deal with any of the limits related to semaphores
> or any of that, but I don't really feel up to taking that on right now.
> 
> cjs
> -- 
> Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
>     Don't you know, in this new Dark Age, we're all light.  --XTC
>