At Wed, 3 Jun 2015 20:11:40 -0400, "James K. Lowden" <jklowden%schemamania.org@localhost> wrote: Subject: Re: memory-mapped I/O (was: Re: Removing ARCNET stuffs) > > Thanks for your post, Greg. Could you explain a bit how writing > worked for sequential files? Was there some kind of "automatically > growing memory" object that would accomodate something like a log > file? The cursor supplied by a file descriptor, especially one > supporting O_APPEND, isn't easy to emulate with mmap. Indeed. At least not without an ecosystem of conventions.... I was sort of hand-waving around the issue of "segments". In the real implementation of Multics a huge amount of reliance on the fact that the machines it ran on had a segmented memory architecture. There are techniques to do the same kinds of things without segments, but that's not how Multics worked. I hated segmented memory architectures like 80286, but the truth is they only hurt when you're dealing with them at user-level in a unixy OS with C or similar -- when they are managed by the OS and the programming language and used to implement a filesystem, they are wonderful, powerful, things. So, in fact a file was just a segment (with a name in the hierarchical filesystem), and with write permission on the segment you could simply write anywhere in it using a pointer (and addressing wrapped around, so if you had a wild pointer it couldn't escape that segment, and since the stack was just another segment, stacks easily grew up (and are not executable) so basic buffer overflows are next to useless to an attacker). I.e. a segment with a name in the filesystem can be attached to the address space of a process by giving its name to a system call. So that's mostly like open()+mmap(), but you get a range of memory representing the maximum size of a segment every time. Accessing a value at any address in the attached segment either paged it in if it already existed on secondary storage, or created a page of zeros for it; and changing a value at any address in the segment set the "modified" flag for that page containing that address, which would eventually cause the page to be written out to disk again and then the "modified" flag would be cleared. I.e. at the most basic level there's no such thing as a strictly sequential file in a Multics filesystem. Anything more is just a convention applied to a basic segment in the system. There were conventions in Multics PL/1 with support from the compiler to use the system call attach a segment to process to then assign it to a pointer with a defined data type thereby effectively mapping any PL/1 data structure (string, array, record, etc.) onto a file such that it could be accessed in a structured way with help from the language and the compiler. You could ask the system for the number of bits (or words) in a file, so if you knew the file was written "sequentially" in units of a known size you could "read" it again by starting at the beginning and counting up to the number of bits allocated to it. Multics PL/1 even had a special feature that allowed you to use the dynamic linker to attach a named data segment as an "external static" variable with same name. Storage for the variable literally _is_ the content of the file with the same name -- it's automatically attached to your address space at run time, paged in on reference (and out on modification), i.e. directly accessed as memory using the data type the variable is declared as. Not as flexible as using a string to look up a segment by name and then attaching that to a pointer, but still a really cool feature to programming easier in some circumstances. These are the kinds of techniques I was talking about in my first post about how one would design and implement algorithms for a true virtual memory OS in ways that would take advantage of files as memory. These were the beautiful and elegant things that were invented about 50 years ago but that we're still not widely using today to our advantage. As I recall an ASCII text segment had the convention of simply ending at the first NUL byte, and I think the location of this byte was generally maintained in the filesystem directory of that segment so you could see the size of a text file without having to access it and scan for the NUL. There was a command that could reset the recorded length and I think it released any disk pages allocated to the segment after that NUL, effectively compacting the file to its current length. There was also a "vfile_" I/O module that allowed programs to have a virtual open/read/write/close mapping over segments. I.e. pretending that a segment is a device with a virtual file I/O driver giving the same API to access files that one would use for devices. If I remember right files created with the "vfile_" module are "structured" files (record-oriented), with private internal data structures describing their organization. A log file that might be appended to would probably be a structured file like this. "Vfile_" was really there though to make it possible to more easily port programs written in languages like FORTRAN and for programmers used to using record or stream oriented I/O. I've also (re?)discovered that there was a Multics C compiler complete with a stdio-compatible runtime library (which perhaps used the "vfile_" I/O module under the hood), so I'm almost 30 years behind in thinking of how to support POSIX-like stdio on a Multics-like system! I do seem to remember it would have been installed on the system I used the year after I left. (No fork() though -- "processes" in Multics were more like workspaces in Lisp or VMs in Smalltalk, and where very expensive to create -- you got one when you logged in, and execution in it was basically passed back and forth between the programs you ran and the command shell, which actually made the starting of individual programs far faster than in Unix (especially with the kernel level dynamic linker), but meant the concept of implementing a "pipe" would require coroutines in the same process.) Hopefully in a really long round-about way this helps answer your question as to how file descriptors and O_APPEND could be emulated in a system that's basically just virtual memory under it all. However there are stories about how poorly some language runtimes performed using virtual file I/O and how programs could run 20 times faster when re-implemented in PL/1 using Multics specific VM based features. Remember that's not moving from unix-like system-call-per operation based stream-I/O to virtual memory access, but just moving from dancing pointers in library functions which are emulating a stream or record I/O API to just using direct memory access. BTW, originally apparently there was an idea that it should be possible to restrict a process from growing a segment and there was an "append" right in the filesystem ACL mechanism, but at least one source says this was ignored and all segments were allowed to grow on demand. Perhaps this was still added in later releases though. I think though it would have meant you could "allocate more pages" anywhere in the segment as opposed to strictly just allocating more pages following the current "end" of the file. When I was using Multics multi-segment files were becoming popular, since a segment was just one megabyte (256K 36-bit words). Some of the big commercial users of the system I used were geological survey companies processing seismic data to find oil, so they had "big data" concerns. Multi-segment files were basically a collection of segments, with one describing how all the rest fit together. Multicians today think there may have been better ways to fix the segment size limitation that would have avoided creating fictions using multiple segments. If you want to get lost in the details, but also learn a little about at least one person's reflections on Multics implementation, I highly recommend Paul Green's paper: http://ftp.stratus.com/vos/multics/pg/mvm.html Another discussion of the early history of Multics reveals some interesting things about how well the virtual memory filesystem worked in practice: http://web.mit.edu/Saltzer/www/publications/f7y/f7y.html See in particular the section "Modular division of responsibility" There's lots more info at: http://www.multicians.org/ There are also lots of Multics manuals available here: https://archive.org/details/bitsavers?and[]=multics -- Greg A. Woods Planix, Inc. <woods%planix.com@localhost> +1 250 762-7675 http://www.planix.com/
Attachment:
pgpheUCNqRRgg.pgp
Description: PGP signature