Re: memory-mapped I/O (was: Re: Removing ARCNET stuffs)

To: NetBSD User-Level Technical Discussion List <tech-userlevel%NetBSD.org@localhost>
Subject: Re: memory-mapped I/O (was: Re: Removing ARCNET stuffs)
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Sun, 7 Jun 2015 18:39:15 +0000
(omnibus reply to both emails)

On Wed, Jun 03, 2015 at 03:28:49PM -0700, Greg A. Woods wrote:
 > > > Of course if one were actually implementing for a virtual memory based
 > > > system like Multics then one would not use silly old stdio and other
 > > > file stream abstractions -- there's just no point to contort all one's
 > > > algorithms into that corset of an abstraction.
 > > 
 > > We've had mmap in Unix for going on 30 years, and it remains a poor
 > > second cousin. At least for writing. Coincidence? You decide.
 > 
 > Yeah, well a proper from-the-ground-up universally virtual-memory
 > operating system has some major advantages over Unix with mmap, not to
 > mention that mmap itself wasn't available in widely used Unix systems
 > until somewhat more recently, and still seems to suffer
 > incompatibilities.  I.e. mmap specifically on Unix doesn't loosen the
 > corset very much at all -- it's just not worth the effort to use it
 > vs. using stream I/O.

Recently as in when machvm was developed in the 80s? :-)  Well, ok,
broad adoption was a bit later. But time does go on; it's been a while
now.

Anyway, I'm not sure how it would be markedly different in a freshly-
designed system; but I've never actually used Multics (or any of the
single-address-space research systems from the 90s either), only read
the papers.

 > You just ask the filesystem what region of memory is represented by some
 > name you know it as and if you can have access to it, and the filesystem
 > and memory manager arrange it so that your process won't get a SIGSEGV
 > when you access that memory.  That file is (and always was and always
 > will be) just an array in memory.

Right, but I could write you a library function that combines open and
fstat and mmap in Unix, and give you that interface too. The reason
people don't (except sometimes read-only) is that changing the size of
the object is a major pain - at a minimum you need to (separately,
non-atomically) make a system call to extend it and then write data
into the newly extended space; but extending it might require moving
it to a different address.

I think the typical way the single-address-space research systems
dealt with this was by never extending objects but creating new ones
(at new addresses) instead; but this creates major complications if
anyone else is trying to use the "same" object at the same time you're
writing to it.

But this doesn't work too well in practice, because in most software
most files are written out once, a chunk at a time, and never updated
afterwards, so nearly all writes need to go through the change-size
logic; it gets expensive and messy.

(Always knowing the size of the object you want to write out when you
open it is an option, but also not a very good or very compatible
one.)

 > We can do this in some sense with mmap, but with true universal virtual
 > memory there are no issues with trying to maintain consistency between
 > the stream and mmap interface.  There are no issues with the size of the
 > file vs. limits in the virtual size of your process.  There is no issue
 > with tracking if either swap or some specific filesystem is backing the
 > region (it's always _the_ filesystem).  You don't even have to close the
 > file, or give up access to one large file in order to access another
 > large file.  In theory you can access _all_ the files in the whole
 > system simultaneously.  _Everything_ is just, and always, a memory
 > address (except again for serialised devices -- they are actually other
 > programs that you talk to and which use device-independent protocols to
 > provide and manage access to the underlying hardware devices).

All of this is backend stuff that doesn't affect applications, and
that is well under control in deployed Unixes (other than OpenBSD)
these days. Modulo the size of virtual address space vs. the size of
your files; but don't use 32-bit hardware and it's ok.

Yet even though it all works, actual uses of read-write memory-mapped
I/O are pretty rare.

 > One of the oft-heard arguments against using mmap more in Unix is that
 > copying data in memory isn't as expensive as one might think vs. the
 > overheads of virtual memory management, and it's true in Unix but that's
 > not really the point if your whole system deals only via virtual memory
 > for all data on secondary storage devices.

That isn't actually an argument :-)

(the relative overhead of copying a page of memory vs. (re)mapping it
has changed over time and will probably change more, as the
relationship of memory bandwidth to caching and TLB effects shifts)

 > Just getting
 > rid of the overhead of a separate user-level ld.so process running for
 > every exec() can pay huge dividends, for example.

Huh? That's done by mapping ld.so, and it's neither separate nor a
process.

 > So, in fact a file was just a segment (with a name in the hierarchical
 > filesystem), and with write permission on the segment you could simply
 > write anywhere in it using a pointer (and addressing wrapped around, so
 > if you had a wild pointer it couldn't escape that segment, and since the
 > stack was just another segment, stacks easily grew up (and are not
 > executable) so basic buffer overflows are next to useless to an attacker).

So all segments had maximal size, and you just don't materialize pages
within them that have never been touched? That gets rid of the problem
with extending objects... at the cost of not knowing the object size,
and more importabtly putting a very hard limit on the size of the
objects you can have and how many you can have at once.

Neither the 286 nor the 386 would be usable this way: the 286 because
your objects would be limited to 64K each, both because you only get
(I think) 16384 segments, some of which need to be used for system
purposes; and the 386 also because the (dumb) way it maps segments
into memory means that one potentially-maximal-sized segment fills the
whole system.

If you presume that segment numbers are per-process and there's some
other wider global internal identifiers (as opposed to the single-
address-space design where the address of a mapped object is global)
and you assume you don't need more than, say, a couple thousand
objects at once (although if you look at the memory map of a large
application these days you may find it's got more regions than that)
dividing the available virtual address space on an amd64 into that
gives you a file size limit that's not big enough for serious use
today. You'd need a hypothetical architecture with segment register
plus a 64-bit address space (complete with pagetables) in each
segment... I suppose that's possible but it doesn't exist.

Also, statically partitioning the address space like this to allow for
a few huge files (even though almost all files are small) is stupid.

So while it's a way to get around the issue of extending objects, it's
not a good one.

 > Multics PL/1 even had a special feature that allowed you to use the
 > dynamic linker to attach a named data segment as an "external static"
 > variable with same name.  Storage for the variable literally _is_ the
 > content of the file with the same name -- it's automatically attached to
 > your address space at run time, paged in on reference (and out on
 > modification), i.e. directly accessed as memory using the data type the
 > variable is declared as.  Not as flexible as using a string to look up a
 > segment by name and then attaching that to a pointer, but still a really
 > cool feature to programming easier in some circumstances.

This is easy to do in Unix (not quite as seamless, but you can stuff
all the logic in a library and never look at it) -- yet nobody does
so. Why? Partly because of the size issue; but more because of real
problems updating such objects like:
   - when do the updates get written to disk?
   - at what granularity do the updates get written to disk?
   - in what order do the updates get written to disk, relative to the
     order I made them in?
   - if the user ^C's the program while it's partway through updating
     the object, so the object state is invalid, how do we recover on
     the next execution?
   - if the system crashes, so the object state reverts to the last
     state that was written to disk, how do we either ensure that
     state is valid or recover?

These are solvable problems (the tools that solve them are known as
"databases") but they're prohibitive for the kind of casual use you're
talking about.

As far as I've ever been able to tell, Multics simply wished these
(and many other) problems away.

 > (No fork() though -- "processes" in Multics were more
 > like workspaces in Lisp or VMs in Smalltalk, and where very expensive to
 > create -- you got one when you logged in, and execution in it was
 > basically passed back and forth between the programs you ran and the
 > command shell, which actually made the starting of individual programs
 > far faster than in Unix (especially with the kernel level dynamic
 > linker), but meant the concept of implementing a "pipe" would require
 > coroutines in the same process.)

So, no protection between the various programs you ran? Good thing
Multics finished dying before the web browser era...

-- 
David A. Holland
dholland%netbsd.org@localhost
References:
- Re: memory-mapped I/O (was: Re: Removing ARCNET stuffs)
  - From: Greg A. Woods
Prev by Date: Re: memory-mapped I/O (was: Re: Removing ARCNET stuffs)
Next by Date: Re: Heirloom Troff for NetBSD (was: Removing ARCNET stuffs)
Previous by Thread: Re: memory-mapped I/O (was: Re: Removing ARCNET stuffs)
Next by Thread: Re: Groff
Indexes:
Home | Main Index | Thread Index | Old Index