Subject: Re: persistent/restorable unix procs?
To: Travis Hassloch <travis@evtech.com>
From: Tom Pavel <PAVEL@SLAC.Stanford.EDU>
List: current-users
Date: 11/29/1995 14:20:19
>>>>> On Tue, 28 Nov 1995, Travis Hassloch <travis@evtech.com> writes:

> Has anyone done any work on (or looking into) how one might dump a process's
> state to disk & restore it, assuming it's a cooperating process?
> E.G. maybe you have a process that wants to save itself on purpose.
> To make it harder, how about an uncooperative process?


The task of saving a process's state is very much akin to the task of 
migrating processes around nodes on a distributed network (of which there 
is a good deal of literature).  Basically, the core file gives you most of 
what you need, but you also have to save/transfer the "hidden" state of 
file descriptors, sockets, signals, and so forth.  This is much easier in 
systems built from the ground up with process migration in mind (Sprite, V, 
Plan-9), but can be done in Unix with appropriate mods to the kernel and/or 
libc.  Things are much easier if the process is "cooperative," in that it 
avoids using tricky Unix features that are hard to checkpoint or duplicate 
(forking subprocesses, opening devices, IPC, etc.)  How much work you need 
to do depends on the scope of your problem...

Here is a collection of papers from groups who have done such a thing for 
Unix systems. These papers are a bit old, but they might still prove 
interesting reading.  As for getting your hands on code, I would think 
Condor is probably your best bet (ftp.cs.wisc.edu:/condor).

K.I. Mandelberg & V.S. Sunderam, "Process Migration in Unix Networks," 
Winter Usenix 1988, p. 357.

Chad Hunter, "Process Cloning: A System for Duplicating Unix Processes," 
Winter Usenix 1988, p.373.

David Nichols, "Using Idle Workstations in a Shared Computing Environment," 
Proceedings of the 11th ACM Symposium on Operating Systems Principles 
(SOSP), (1987) p. 5. 

Rafael Alonso & Kriton Kyrimis, "A Process Migration Implementation for a 
Unix System," Winter Usenix 1988, p.365.

M. J. Litzkow, M. Livny, and M. W. Mutka, "Condor - A Hunter of Idle 
Workstations," Proc. 8th Int'l. Conf. on Distr. Computing Sys., June 1988.


Good luck,

Tom Pavel

Stanford Linear Accelerator Center
pavel@slac.stanford.edu                 http://www.slac.stanford.edu/~pavel/