Subject: Re: Program recovery using checkpointing
To: None <kamalp@acm.org>
From: SODA Noriyuki <soda@sra.co.jp>
List: tech-kern
Date: 03/11/2005 20:07:27
>>>>> On Fri, 11 Mar 2005 01:00:08 -0800 (PST),
	"Kamal R. Prasad" <kamalpr@yahoo.com> said:

> The bigger question is, can this be integrated into the src tree
> of netbsd?

It's not sure that implementing checkpoint feature in the kernel
is the way to go, it's usually done in a userspace library in
the UNIX world.
One of the reasons why it's done in userlevel instead of kernel is
that checkpointing has to store references to files as pathnames, and
usual UNIX kernel doesn't remember pathnames once the file is opened.

Dragonfly solves this problem by using a file handle instead
of a pathname as the file reference.
But this has at least two problems.
- To implement process migration, a pathname is more portable
  than a file handle. For example, if the checkpointing framework
  uses a file handle, and /lib is a local filesystem, a reference
  to /lib/libc.so.12 on a machine cannot work on a different machine,
  Oppositely, if the checkpointing framework uses a pathname
  as a file reference, such reference just works.
  (BTW, this is what Tim Rightnour pointed out in a private
  conversation.)
- Exposing file handles to userland has certain security risks.
  This is especially problematic at checkpoint restarting time.
  That's why Dragonfly allows only wheel group to do checkpointing
  by default.
  Also, I'm not sure that exposing file handles to usual user is
  always safe for all filesystems. At least both NetBSD and Dragonfly
  allow only root to use getfh(2) systemcall. But it seems Dragonfly
  actually exposes file handles for every user who can open the file,
  because their coredump file includes file handles of the file
  descriptors when the process dumps core.
  (from DragonflyBSD's sys/kern/imgact_elf.c:elf_putfiles())
  Using a pathname doesn't have such security risks.

So, I recommmend to use userlevel checkpoint library for now.
NetBSD pkgsrc already includes pkgsrc/devel/chkpt, for example.
--
soda