Subject: Re: MFS over ISO-9660 union mounted with no swap space?
To: Gandhi woulda smacked you <greywolf@starwolf.com>
From: Mike Cheponis <mac@Wireless.Com>
List: tech-kern
Date: 05/14/1999 01:00:01
> # Firstly, the notion of "swap" is antiquated and has no purpose in an OS on
> # the verge of the 21st century.
> # 
> # The correct way to do this, IMHO, is to use unallocated filesystem space
> # as swap space.
> 
> Too much translation overhead.  The advantages of having a raw partition
> is that you don't have to go through the filesystem to access the swap
> space, and as an aside on the same level, you don't have to worry too much
> about what you're tromping on (the paging code keeps track of what's
> used and what isn't), so you won't be tromping inode blocks or whatever.
> 
> The paging daemon worries about paging; don't make it worry about
> the filesystem, too (yes, I know this is kind of what happens when
> one uses a swap file, which is why I don't use them).

This seems like a truly minor issue to me.  

It merely means that the abstraction needs to be split at a lower level, 
and the filesystem built on top of that lower abstraction; the only thing
common is the allocation bit vector.

This seems so obvious to me; what am I missing?


> # The bit vectors that hold allocation status are in memory (as well as on
> # disk).  When some disk space is needed for "swap" then only the in-memory
> # allocation status vector is changed to reflect that some piece of the
> # disk is "used" and can't be allocated by real files.
> 
> Yeah, but you can end up with a lot of fragmentation this way.

Not necessarily; it depends on allocation policy.

Also, I don't know why everybody who has commented on this issue
seems to have been  thinking that we're still dealing with 300 MB Fujitsu
eagle drives!  Gosh, my buddy just installed a 22 GB IBM IDE disk into
his linux box last night (costs $421, incidentally.)

As for fragmentation, unix should have a backround process that auto-defrags.


> # This method has the further advantage that (on a disk that's not nearly
> # full) you can often locate what I will call this "Dynamic Swap Space" near
> # the data files that are being used, and therefore, minimize total disk I/O
> # time compared with a fixed swap allocation scheme.

> Okay, you have a process that has just allocated a huge chunk-o-swap.
> Now another process wants to allocate some of that filesystem space
> as a file.
> 
> Who's going to lose, the one that got the page, or the one that wants
> the file?

This is a policy decison, not a technical decision.  Limited resources
force such policy decisions.


> What if the process that wanted the file had just been told that
> the space was available?

This is, once again, merely a resource allocation issue.


Guys, I really don't get it, this seems soooooo simple and useful!
And it's not like it's a new idea; I first heard about it being implemented
in a unix variant called DNIX in 1979, yes, 20 years ago(!).



># Lastly, I think it's totally bogus on a VM system if I can't have an array
># that uses up the whole disk if I want, so the programmer has this abstraction
># of a Very Large Memory.  In braindead architectures like the i386, I'm
># probably limited to 4 GB, but in reasonable architectures, I don't see why
># there needs to be any other limit than available disk space.
>
>Which is why we have swap files if you really need them.

Once again, let me shout at the top of my lungs: swap files are bogus!
(I know, shouting doesn't help...).

The problem is that a fixed amount of disk space is allocated for sway;
DynaSwap deals with this in a better way, I think.


> Once we can
> DEallocate swap, that will actually be a win,

Which is -exactly- what you do when the process terminates.

>but if you're going
>to do that, DynaSwap is a lose (see above), UNLESS you want to have it
>available as a tunable parameter for a LIVE filesystem, in which case
>you know what you're doing with your system.  If my system runs out
>of filesystem space because some process snarfs it up for swap space
>between the time I ran a statfs() call which returns a suitable
>amount of space left and a write() sequence which fails, I'm going
>to be justifiably pissed off unless I've set my system up to do this.

Policy issue, see above.


> I tried to suggest that we claim inode blocks and convert them to
> data blocks in times of dire need, but the return between i-blocks
> and d-blocks is peanuts (it takes 32 inodes to make a data block
> on a 4k filesystem) and reallocation of freed data blocks to inodes
> proves problematic largely due to fragmentation and placement issues.
> 
> What you're suggesting is very close in principle to what I suggested.

I don't know exactly what you suggested, but I'd be happy to read what
it was if you give me a pointer.


In sum:

1. Dedicated swap is totally bogus in 21st century OSs

2. Auto-defrag should be a low-priority background task, always running

3. Fixed numbers of inodes is totally bogus in 21st century OSs

4. I should be able to have a VM system that allows me to have an array that
   occupies all of available filesystem space if I want, and to be able to
   manipulate that array in my userland program as if I had all of that
   memory; after all, what the heck is a VM system for?


Thanks everybody for contributing to this discussion.  As you can see,
I have no interest in keeping unix sacred cows alive, 'cause sacred cows
make the best burgers...onward to the 21st century unix!

-Mike Cheponis