tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Unexpected out of memory kills when running parallel find instances over millions of files



On Thu, Oct 19, 2023 at 10:49:37AM +0000, Michael van Elst wrote:
> mjguzik%gmail.com@localhost (Mateusz Guzik) writes:
> 
> >Running 20 find(1) instances, where each has a "private" tree with
> >million of files runs into trouble with the kernel killing them (and
> >others):
> >[   785.194378] UVM: pid 1998.1998 (find), uid 0 killed: out of swap
> 
> 
> >This should not be happening -- there is tons of reusable RAM as
> >virtually all of the vnodes getting here are immediately recyclable.
> 
> While vnodes would be recyclable, they hardly get recycled unless
> an filesystem object is deleted or the filesystem is unmounted.
> 

They get recycled all the time by vdrain thread if numvnodes goes above
desiredvnodes, like it does in this test.

Part of the problem is that with 20 processes walking the filesystem it
gets outpaced (20:1) and has no means of stopping it, while the memory
allocator just keeps handing objects out.

> >Specs are 24 cores, 24G of RAM and ufs2 with noatime. swap is *not* configured.
> 
> Without swap, the kernel also has no chance to evict process pages
> to grow the vnode cache further.
> 

It should not be trying to grow the vnode cache. If anything it should
stop it from blowing out of proportion and definitely should not kill
processes in presence of swaths of immediately freeable vnodes.

As noted above there is code to try to do it, but it is not sufficient.

I tested several systems (Linux, all the BSDs and even Illumos) and only
NetBSD fails to complete the run. That is to say even OpenBSD chugs
along no problem.

This is definitely a reliability problem in the kernel.

Traditionally vnode allocation would recycle something from the "free"
list if need be. Perhaps restorign this behavior is the easiest way out
for the time being.



Home | Main Index | Thread Index | Old Index