Subject: Re: PROT_EXEC mappings of vnodes -> VTEXT
To: Jason R Thorpe <thorpej@wasabisystems.com>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 10/29/2001 21:53:05
hi,

cool, I'm glad that this improves the situation so much.

but I really don't think we should allow unpriviledged users
to cause any file they can read to become read-only just by
mapping it with PROT_EXEC.  the VTEXT flag was originally used
just for the read-only part, I was overloading it when I used it
for affecting paging behaviour.

so I was thinking of adding a new vnode flag VEXECMAP, which would
be what the paging system would use, and only use VTEXT for what it
was used for originally.  only execve() would set VTEXT, but all
PROT_EXEC mappings (from execve() or mmap()) would set VEXECMAP.

also, putting "vnode" and "vtext" in the sysctl names was a mistake,
since they don't really mean much to people who don't work on the kernel.
I should have used "file" and maybe "exec".

I'll try to make these changes soon, unless you'd like to take care of it.

-Chuck


On Mon, Oct 29, 2001 at 03:01:38PM -0800, Jason R Thorpe wrote:
> So, Perry Metzger asked me to look into this today, and I cooked up
> a patch that does:
> 
> 	If a PROT_EXEC mapping for a vnode is established, mark
> 	the vnode as VTEXT.
> 
> 	If a VA range has its protection changed to include PROT_EXEC,
> 	mark all vnodes mapped by the VA range as VTEXT.
> 
> The upshot of this is that dynamically loaded objects (both normal
> shared libraries as well as other dlopen()'d objects) will be considered
> as "text" by the pagedaemon, like normal executable images are.  This
> changes how the pagedaemon considers the vnode's pages for replacement.
> 
> Perry reports that the behavior of his system under heavy filesystem
> I/O load is much better (that is, once he tunes vm.vtextmin up to
> around 30% or so).  Without my patch, tuning up vtextmin didn't make
> much difference for him, since libc, etc. were still getting tossed
> out when the file system wanted pages.
> 
> I'm going to go ahead and commit the patch -- eventually, we might want
> to consider entirely different page replacement algorithms, but this is
> a fair stop-gap for now, and seems logical when you consider that shlibs
> are effectively part of the executable images that are already marked as
> VTEXT by the kernels' exec code.
> 
> -- 
>         -- Jason R. Thorpe <thorpej@wasabisystems.com>