tech-kern: Re: Program recovery using checkpointing

Subject: Re: Program recovery using checkpointing
To: SODA Noriyuki <soda@sra.co.jp>
From: Kamal R. Prasad <kamalpr@yahoo.com>
List: tech-kern
Date: 03/11/2005 06:24:19
--- SODA Noriyuki <soda@sra.co.jp> wrote:
> >>>>> On Fri, 11 Mar 2005 03:58:33 -0800 (PST),
> 	"Kamal R. Prasad" <kamalpr@yahoo.com> said:
> 
> >> One of the reasons why it's done in userlevel
> instead of kernel is
> >> that checkpointing has to store references to
> files as pathnames,
> >> and usual UNIX kernel doesn't remember pathnames
> once the file is
> >> opened.
> 
> > My patch doesn't remember pathnames. It constructs
> > pathnames pased on struct proc fields (p_comm,
> > p_pid..)
> 
> It seems you are talking the pathname of the command
> itself.
> What I'm talking here are all files that the command
> is
> using, via file descriptors, or mmap, etc.
> 

No -I am talking of the checkpoint file itself.
The checkpoint filename is generated by my code, when
doing a save/restore. The patch I have given contains
the src code to do what Im stating here.

> >> - To implement process migration, a pathname is
> more
> >>   portable than a file handle.
> 
> > The scheme which I have implemented allows the
> user to
> > specify the program execution pts where he saves
> > process state and restore them. The data and stack
> > segment are quite vulnerable to corruption.
> 
> It seems you are missing point.
> "Process migration" is:
> 1. do checkpointing.
> 2. move the checkpointed image to an other machine.
> 3. do checkpoint restaring on the other machine.
> Due to the implementation of a file reference on
> Dragonfly,
> process migration simply doesn't work.
> That's a bad thing.
> 

My code isn't meant for process migration and it
*skips* saving file descriptors. Because the process
is still running when a recovery is initiated, it can
re-use all file descriptors currently open. This means
that with dragonflybsd -you cannot re-use
pipes/sockets erc.. In my case, any and every file
descriptor that was open before recovery(tlongjmp())
is usable after recovery. This is esp. useful if inetd
gets a segfault -but you want to re-use sockets.
 To migrate a running process, I would need to do a
lot more work -assuming the class of applications
isn't just scientific applications as in the case of
dragonflybsd. I would have to migrate fds from kernel
image to kernel, create a tcp connection so migrated
process when it writes to a pipe/unix domain socket is
routed back to orginal system etc.. and some awful
things with shared memory. 

> >> This is especially problematic at checkpoint
> restarting time.
> >> That's why Dragonfly allows only wheel group to
> do checkpointing
> >> by default.
> 
> > My patch ensures any and every unix process can
> > checkpoint -without getting into any conflict.
> 
> It seems you are missing the security risk.
> Due to the way how file references are recovered at
> checkpoint
> restarting time on Dragonfly (I mean, what
> ckpt_fhtovp() in
> sys/kern/kern_checkpoint.c on Dragonfly is doing),
> the checkpoint
> feature cannot be used from usual user privilege.
> Otherwise usual user can violate UNIX file access
> permission.
> 
Ok -they are being ambitious in writing fds to disk
and recovering them. I am not that ambitious. I expect
the process that created a checkpoint to be still up
and running when a recovery is initiated. And it is
meant for program recovery -not process migration.

regards
-kamal


------------------------------------------------------------
Kamal R. Prasad
UNIX systems consultant 
kamalp@acm.org

In theory, there is no difference between theory and practice. In practice, there is:-).
------------------------------------------------------------


		
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/