current-users: Re: heavy use of mmap() & regex ?

Subject: Re: heavy use of mmap() & regex ?
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Tom Ivar Helbekkmo <tih+mail@Hamartun.Priv.NO>
List: current-users
Date: 05/01/1998 15:31:50
der Mouse  <mouse@Rodents.Montreal.QC.CA> writes:

> I don't see mmap as having poor error handling, nor do I see why any
> single program would want to mix read/write and mmap.

The trouble with mmap and error situations should be obvious when you
compare it to read/write I/O.  When you read() and write(), you get
error returns that immediately tell you that this particular operation
failed, and why.  You can handle the error in an intelligent fashion.
With mmap, on the other hand, you can either get an error as you try
to access an object in memory, at which time you'll get a segfault,
and that's all you know about it, or you can _not_ get the error then,
because the problem actually strikes later, for instance when msync()
is called, causing an attempt to actually access the file.  If you've
performed several memory accesses, you don't know which ones "took",
and which didn't!

This is why people decide to mix read/write and mmap: when they really
need to know exactly what happened, and must be able to back out of
any kind of error situation properly, but very much want to use shared
memory mapped access anyway they use mmap for reading, but use write
to change data in the mapped file.  This works when your buffer cache
and your VM system are integrated, so long as the file being mapped is
not on an NFS mounted file system, in which case bets are off anyway.

> The problem arises primarily when two programs try to operate on the
> same file at once, one using read/write and the other using mmap.

Yup.  And this is something you simply cannot allow with the current
implementation, unless both programs are disciplined and cooperative,
using some sort of common semaphore mechanism to make sure that the
one using read/write never gets in between memory writes and msync()
calls in the other, and that the one using mmap also always calls
msync() after data has been written with write(), before accessing the
data in memory.  Ugly, but possible.  Again, please note that even
with a VM system integrated buffer cache, you'd have to do this if NFS
is involved.

An example of this way of doing things is what the Cyrus IMAP system
from CMU does on systems wihout integrated buffer caches: writes are
done atomically, using locks, and access to the memory mapped data is
preceded by stat() calls to check if the mtime of the mapped file has
changed, in which case msync() is called.  No data is ever changed by
modifying it directly in the mapped region.  (I'm grateful to John
Friend, one of the primary Cyrus architects, now at Netscape, for
taking the time to explain all this to me when I questioned his way
of doing things.  He taught me most of what I've explained above.)

> How *can* it be possible to "fully separate" the two access methods,

Discipline?  :-)

A really safe mmap access method is conceivable: it would involve
special handling of the mapped memory region, making all access to it
actually pass through the low level I/O system, coupled with a really
good signal-based error reporting and handling system.  This would
make it less efficient (although I don't know by how much), and would
be unwanted overhead for many uses of mmap, so should probably be an
option to enable when needed.

Of course, error handling in C under UNIX really sucks anyway.  ;-)

-tih
-- 
Popularity is the hallmark of mediocrity.  --Niles Crane, "Frasier"