Subject: Re: vm size wrongly reported
To: Robert Elz <kre@munnari.OZ.AU>
From: Andrew Brown <atatat@atatdot.net>
List: current-users
Date: 05/04/2003 22:44:50
>  | >if we add them all up yes but just using the size count from vm_map should
>  | >give the exact size of the virtual address space,
>  | 
>  | so...it does, but that number is considerably larger than a given
>  | process's impact on the vm system, since a lot of that space is
>  | shared.
>
>I suspect you're confusing VM and real memory.   I'm not sure which it
>is you're wanting to count, but what's above makes no sense.

i suspect i'm just using terminology which is confusing to you, but i
think i know what i mean.  when you say "vm", do you mean in terms of
the 32 bit address space that is emulated for each process, or the
total amount of physmem and swap that the kernel is managing?

>Take the simplest possible example.   A process forks (not vfork, not that
>it matters).   The fork call has not yet actually returned in either process
>(some higher priority process took away the CPU) but it has otherwise
>finished - neither process has had a chance to touch a page.   Each process
>has identical process images (share every page).
>
>The total VM consumed by those two processes is twice that consumed by
>one of them (twice as much as was consumed before the fork).   That all
>of the pages are shared is irrelevant.   To calculate VM allocations all
>that should ever be done is to add the VM of every process in the system.

yes, they consume the same amount of virtual memory, each, from a
certain point of view, but the the cost of the second one is merely
the vm structures that the kernel allocates (at the instant right
after the fork, of course).  to say that the cost of each of them was
equal would be, imho, wrong.

if all you want is a simple metric to calculate vm allocations, you
would do as well to merely use the vm_map->size parameter, since that
counts the total size of a given process, except that number is
probably "wrong" for most applications.  when i want to know where my
ram went, i look at ps and look for something either either a large
vsz or rss.

what i'm trying to get at is that things like libc are shared whereas
libc's data segment is not, and stacks almost certainly aren't (except
in the fork case above, until one or the other of the processes writes
to the shared stack pages, in which case it gets cow'ed).  yet another
process mapping in libc doesn't cost all that much, whereas another
mapping of an equivalent size of something else can cost a lot more.

i can't say i know of a good metric for comparing things like this.
only that i can think of reasons why most of them could be considered
incorrect.

>On the other hand, the real mem consumed is much harder to attribute to
>processes, simply adding it will give vastly wrong numbers for the totals.

yes, very wrong.

>In the above example, one might allocate all of the real mem to one of the
>two processes, and 0 to the other, or one might attempt to allocate it
>"fairly" (attributing half of the allocated mem to each process) - calculating
>the total is easy, ignore the processes, and count the allocated pages.

...each way being wrong for some reason to someone else, of course.

>Distributing the total amongst the users in some rational way is hard
>(because no two people will agree on what is rational, not because implementing
>an agreed method needs to necessarily be difficult).

your last bit about "rational" is what i was alluding to when i
mentioned different points of view.

-- 
|-----< "CODE WARRIOR" >-----|
codewarrior@daemon.org             * "ah!  i see you have the internet
twofsonet@graffiti.com (Andrew Brown)                that goes *ping*!"
werdna@squooshy.com       * "information is power -- share the wealth."