tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Unexpected out of memory kills when running parallel find instances over millions of files



I think I reported this something like 20 years ago, but noone really seemed to care. I noticed it pretty much right away after NetBSD switched to the unified memory thing, where all free memory usually was grabbed as disk cache. It was not fun on VAX, but at the time it seem other platforms didn't suffer enough to consider it a problem. I guess over time it's just gotten worse...

  Johnny

On 2023-10-21 13:01, Manuel Bouyer wrote:
On Fri, Oct 20, 2023 at 10:26:05PM +0200, Reinoud Zandijk wrote:
Hi,

On Thu, Oct 19, 2023 at 11:20:02AM +0200, Mateusz Guzik wrote:
Running 20 find(1) instances, where each has a "private" tree with
million of files runs into trouble with the kernel killing them (and
others):
[   785.194378] UVM: pid 1998.1998 (find), uid 0 killed: out of swap
[   785.194378] UVM: pid 2010.2010 (find), uid 0 killed: out of swap
[   785.224675] UVM: pid 1771.1771 (top), uid 0 killed: out of swap
[   785.285291] UVM: pid 1960.1960 (zsh), uid 0 killed: out of swap
[   785.376172] UVM: pid 2013.2013 (find), uid 0 killed: out of swap
[   785.416572] UVM: pid 1760.1760 (find), uid 0 killed: out of swap
[   785.416572] UVM: pid 1683.1683 (tmux), uid 0 killed: out of swap

This should not be happening -- there is tons of reusable RAM as
virtually all of the vnodes getting here are immediately recyclable.

$elsewhere I got a report of a workload with hundreds of millions of
files which get walked in parallel -- a number high enough that it
does not fit in RAM on boxes which run it. Out of curiosity I figured
I'll check how others are doing on the front, but key is that this is
not a made up problem.

I can second that. I have had UVM killing my X11 when visiting millions of
files; it might have been using rump but I am not sure.

What struck me was that swap was maxed out but systat showed something like
40gb as `File'. I haven't looked at the Meta percentage but it wouldn't
surpise me if that was also high. Just some random snippet:

I've seen it too, although it didn't end up killing processes.
But the nightly jobs (usual daily/security+ backup) ends up pushing to
swap lots of processes, while the file cache grows to more than half the
RAM (I have 16Gb). As a result the machine is really slow and none of the
nightly jobs complete before morning.

Decreasing kern.maxvnodes helps a lot.


--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt%softjar.se@localhost             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


Home | Main Index | Thread Index | Old Index