Subject: Re: CRITICAL ** Holes in default cron jobs ** CRITICAL
To: Warner Losh , Matt Thomas <matt@lkg.dec.com>
From: Stefan Grefen <grefen@hprc.tandem.com>
List: tech-kern
Date: 01/02/1997 10:55:09
In message <E0vf8Gh-0007ac-00@rover.village.org>  Warner Losh wrote:
> In message <199612301851.SAA29715@whydos.lkg.dec.com> Matt Thomas writes:
> : Acutally, a 
> : 
> : int unlink2(const char *name, const struct stat *statbuf);
> : 
> : would solve the problem.  In essence, you stat/fstat the file first (which
> : you are going to do anyway (to make sure it's on the device, old enough,
> : etc.)) and then you pass that stat buf to unlink2.  The kernel can then
> : verify that <name> is the same object as represented by the information
> : in *<statbuf> and then proceed with the deletion.  If the information
> : (dev,inode,generation) doesn't match, unlink2 fails.  The kernel can easily
> : make this an atomic operation.
> 
> You still have the race here.  Between the readdir() and the stat(),
> the file can change out from under you, and then you go ahead and
> delete the wrong thing because the stat info matches :-(.

This race can be detected in usermode by checking the directory's st_mtimespec,
st_ctimespec and the files inode-number against the one in the stat-buffer.
If the inode-number matches and the timespecs of the directory didn't change
it's still the same file. If not rescan the directory. 

A more convenient way to do that, would be to return the inode-generationnumber
in the result of readdir().

This kind of race is only dangerous for operations that modify things,
read-only commands like stat can be retried.

I think the unlink2 call solves the problem for the case at hand.

For more general solutions a good starting point would be to look at the
DMAPI standard-proposal (Data Management API, former DMIG) because the have
to solve basicly the same problems in a more nasty environment. 

The goal should be to enhance the definition of an inode (which is included
in the readdir() result) to uniqly define a file (on the disk AND in the
time-domain). 
This can easily be done by adding the msec time at the time of creation to the
inode (plus workarounds for reboots, broken clocks etc.) and a machine or
disk ID to avoid problems moving disks, between machines.

Than add operations that can work on this handles (either
by using a a NOIO open or be different/additional arguments to existing
systemcalls). 
A very elegant solution would be a unique-inode-filesystem. 
You could even use sh,ls and rm than to securely remove files:
(lets assume -I means display unique-inode).

# ls -lI  /tmp/xx
BD452981764399A645 -rw-r--r--   1 grefen  wheel     1040 Dec 25 23:11 xx
# rm /uinofs/BD452981764399A645
#

This would only remove the file /tmp/xx which existed at the time of the ls.
Regardless if it is issued immediately or a year after the ls.


Stefan

> 
> Warner

--
Stefan Grefen                                Tandem Computers Europe Inc.
grefen@hprc.tandem.com                       High Performance Research Center
You should never bet against anything in science at odds of more than
about 10^12 to 1.
                -- Ernest Rutherford