Subject: Resolved: MySQL problem "Too many open files" or "Too many open files in system"
To: None <netbsd-users@netbsd.org>
From: Eric S. Hvozda <hvozda@ack.org>
List: netbsd-users
Date: 09/02/2002 01:33:43
Recently, I ran into a very interesting problem with MySQL, and it
wasn't easy to debug, and apparently others had run into it, but
did not get a resolution.  So I post this in hopes it will help
someone to avoid the time it took me to find a workable solution.

While using MyPHPNuke for a discussion list, one developer complained
of getting "invalid MySQL result set" for some of his pages, in
the MySQL logs "Error in accept: Too many open files" appeared.

It was fairly apparent this was file descriptors and after looking,
it easy to see more was required.  "fstat | wc -l" showed I was
close to kern.maxfiles, so maxusers was increased to 128 from 32
(I was amazed I got away with running with maxusers=32 for this
long with 105 users on the machine).  This solved the shortfall of
file descriptors, but did not change the behavior.

After some web surfing, those who had seen this before me mentioned
setting open-files-limit= for MySQL.  This did not change the
behavior.  "fsgtat | grep mysql | wc -l" show we ran out of gas at
67 file descriptors consistently (this is on NetBSD-1.5.2 i386).

After using sysctl to examine ...rlimit.descriptors.soft, it was
apparent that the server process (ie safe_mysqld) had too few
descriptors.  After adjusting both open-files-limit for MySQL and
...rlimit.descriptors.soft for the safe_mysqld process I was able
to eliminate the "Error in accept: Too many open files" message.

However the developer still reported getting "invalid MySQL result
set" on his pages, but less frequently and much more sporadically.
Looking at the log now showed "Error in accept: Too many open files
in system".

A quick look at "fstat | wc -l" showed less than 1/3 total file
descriptors in use. "fstat | grep mysql | wc -l" showed we were
definately under the per process file descriptor limit as well.

After alot of research and surfing, I came up empty.  I didn't know
what to do.  I went to visit the developer.

It was apparent that after 3 or 4 page reloads, MySQL would tank
with "Error in accept: Too many open files in system" and would
require a restart.  After one of these restarts, the developer
showed me he was getting "invalid MySQL result set", yet there was
not message in MySQL's log about too many files at this point.

It appeared that that the client process (which to this point had
been totally ignored) may be running out of file descriptors itself.

(theory: if enough clients hang due to low file descriptor resources,
the MySQL server will hang as well and falsely report "Error in
accept: Too many open files in system" as a poor error message.)

Up to this point per process limits on file descriptors was being
altered with sysctl in .../share/mysql/mysql.server.

(/etc/login.conf works on ttys and doesn't help in the case where
MySQL is manipulated from cron.)

This was fine for the MySQL server process, but for clients, it
was a disaster.  While the pid for the server process was discrete,
clients could be any process.  What we really needed to do was
alter the per process soft file descriptor limit.

Some reseach with sysctl from csh and sh revealed that sysctl
control of ...rlimit.descriptors.soft did not appear to work
reliably/consistently:

sysctl proc.curproc.rlimit.descriptors.soft (response 64) sysctl
-w  proc.curproc.rlimit.descriptors.soft=128 (response 64 -> 128)
sysctl proc.curproc.rlimit.descriptors.soft (response 64 still!)

Therefore it was decided that it was necessary to relax the limits
in the kernel and not rely on the values set via sysctl for child
processes.

Examination of the source tree led me to .../sys/kern/init_main.c;
part way down I saw what I was after and applied the following
patch:

*** init_main.c.dist    Sun May  6 11:22:50 2001
--- init_main.c Mon Sep  2 00:20:51 2002
***************
*** 299,306 ****
                    limit0.pl_rlimit[i].rlim_max = RLIM_INFINITY;

        limit0.pl_rlimit[RLIMIT_NOFILE].rlim_max = maxfiles;
!       limit0.pl_rlimit[RLIMIT_NOFILE].rlim_cur =
!           maxfiles < NOFILE ? maxfiles : NOFILE;

        limit0.pl_rlimit[RLIMIT_NPROC].rlim_max = maxproc;
        limit0.pl_rlimit[RLIMIT_NPROC].rlim_cur =
--- 299,305 ----
                    limit0.pl_rlimit[i].rlim_max = RLIM_INFINITY;

        limit0.pl_rlimit[RLIMIT_NOFILE].rlim_max = maxfiles;
!       limit0.pl_rlimit[RLIMIT_NOFILE].rlim_cur = 512;

        limit0.pl_rlimit[RLIMIT_NPROC].rlim_max = maxproc;
        limit0.pl_rlimit[RLIMIT_NPROC].rlim_cur =

After rebuilding a new kernel and booting it, tests confirmed the
behavior was gone.  Apparently the client *was* running out of file
descriptors and was causing problems in the server and making it
*falsely* report "Error in accept: Too many open files in system".

Hopefully this is help someone else avoid the huge debug I did...