Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: non-automated test failure report! :)



On Tue, 15 Nov 2011, Andreas Gustafsson wrote:

> It seems pretty clear to me what is happening: the test program is
> using a lot of memory and is being killed because the machine is
> running out of swap.
> 
> What's not clear to me is why it's failing in this particular way
> instead of e.g. malloc() returning NULL, and what to do about it.

This is happening because the kernel overcommits memory.  When a page is 
mapped into the process' address space, either through mmap(), sbrk(), or 
increasing the stack limit, the address space is increased but no 
resources are allocated or reserved.  Then later, when the process tries 
to access the page, it takes a page fault and the kernel looks to allocate 
a page frame for the process to use.  If the kernel can't free up a page 
frame, the only things it can do is block the process and hope something 
else frees up a page frame it can use to fulfil the page fault, or kill 
the process.

A while back I looked into preventing overcommit by tracking ovarall 
address space allocation and comparing it to total swap space.  This would 
allow the kernel to return errors through the system call interface 
instead of just killing off processes.  However, page loaning made the 
accounting extremely difficult and I was unable to design something that 
could keep an accurate account of address space allocations.

Right now the only solution to running low on swap is adding more swap 
space.  

Eduardo 


Home | Main Index | Thread Index | Old Index