NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: birthtime [was: netbsd : internals : ...?]



    Date:        Mon, 29 Jul 2019 12:14:59 +0200
    From:        Rocky Hotas <rockyhotas%firemail.cc@localhost>
    Message-ID:  <20190729101458.423drwvlknf3hdsd@delpotro>

  | If you doubt about the usefulness of birthtime, you surely encountered
  | several reasons to do this.

It is almost impossible to demonstrate a negative by examples, it can
be done sometimes by a mathematical proof, but unless we can reduce
"useful" to some mathematical (or logical) abstraction, which I doubt,
that method is unavailable.   On the other hand, to demonstrate that
something is useful all you need to do is provide one single example of
its use.    I cannot do that as I doubt that there is one (not one that
can be implemented).   If you (or someone else) believes this is useful,
you must be able to say why (that is, how you are going to use it).

  | I do not deeply know Unix or FFS internals, so I can not give a technical
  | reason, "from the inside", to justify the usefulness of birthtime.

You don't need to - in fact it is probably better that you don't.
Forget all technical details (we may come to some of those later)
and just worry about what you're going to use the data for (a bit
more on this below).

  | But I assume this definition as a consequence of the following
  | one: the `birthtime' of a file is the time when this file is created, ever.
  | This should not be dependent on the filesystem.

In that case, the data would be better kept somewhere outside the
filesystem - perhaps in the file itself like how the header of a
jpeg file can contain the date that the photo was taken, which then
(unless deliberately modified) remains with the photo file, however
it is moved around.   Similarly HTML files can have dates in the meta
data at the head of the file.

  | A file may be created in a filesystem and then moved, copied, or modified.
  | Yes, one can object that when a file is copied, it actually becomes another
  | file, so the birthtime definition in this case is ambiguous.

It isn't ambiguous, it needs to be defined one way or the other, or it
cannot be implemented (the code to implement it must follow some set of
rules - the program - and while those can have conditions, it all becomes
useless unless we can say just what rules have been followed.)

  | I'll keep considering the birthtime as the one of the original file,
  | which should be preserved by its copy (this is arbitrary, I know).

There is nothing wrong with being arbitrary, if this is to remain, we are
(eventually) going to need a definition, and as part of that we will need
to make decisions.  Some of those are likely to be for various good reasons,
but others are likely to be simnply arbitrary choices.   There's nothing
wrong with that.

But we're getting ahead of ourselves, we need to find the use case first,
and then we can work out a definition for the field which allows that
use to actually be implemented.

  | `birthtime' of a file represents a kind of information, which Last
  | Modified time can not keep.

Of course, if they were the same we would definitely not need both.
It is even (kind of) obvious from its name, what kind of info birthtime
is supposed to be, but none of that shows a use for the data (there are
times when we believe that some data simply must be useful - it seems
obvious - but when we really look into it, we fail to find anythying that
we really need it for.)

  | Consider a configuration file. Assume that it has been created 10 years ago,
  | when the company was running Linux. 5 years ago, the company switched to
  | NetBSD.   You are a new employee and you just discovered that this
  | configuration file has odd behaviours.
  | If you are able to know that this file has been created during
  | the "Linux epoch", you can immediately suspect about the Linux syntax being
  | responsible these problems.
  | If you only see the Last Modification time, which may be last month, the
  | reason for this inconvenient will be harder (and slower) to be determined.

OK, now we are getting to something that might be a use.   You want the
birthtime to tell you when the data in the file was first created, so
you can base other decisions on that.   I'd submit that in that case,
putting the relevant date/time in the contents of the file would be more
appropriate, but let's ignore that, as we're concerned right now with
uses for the data, not how it is stored.

  | This may also be subject to a script: all the files whose birthtime is
  | lower than `some date' must be subject to this string substitution. A
  | silly example: `uname' instead of `lsb_release'.

  | I had a very similar issue related to file encoding: a file created on
  | Windows with ISO-8859-1 was not properly readable on BSD. Being able to
  | know the file birthtime helped to know the original file was created
  | on Windows, and therefore its original encoding.

Those two are simply poor uses, for the first it is hard to suggest
an alternative without more info as to what is actually happening, but
for the second, relying upon any date info is simply wrong ... even
after you can done the conversion, someone can create a new file on
windows, and ship it to a unix system, where it needs conversion.  The
date doesn't tell you anything useful at all.   For that, what you need
to know is what was "wrong" with the file (my guess would be \r\n line
endings, or something related) and then the correct process is to
first determine that the file is of the correct generic type (a text
file I am assuming, since you gave a character set name) and then test
if it has that windows characteristic, and if so, then convert it to
the unix form.   Creation/birth/modification times completely irrlevant.

If you need more convincing, recall that by your definition (above) of
what birthtime should be, making that change to the file contents would
not alter the file's birthtime - next year when you come back and need
to make the same decision (about this file, and others) again, a test of
its birthtime will be useless, as this file would still retain its original
windows birthtime, with no indication that it had already been modified to
unix form.

The same applies to your configuration file example earlier, if one were
to use the info that it was created under linux to look for differences,
and correct something that in inappropriate for NetBSD, then after that
correction it is going to retain its "created under linux" birth time
marker.   Next time there is an issue with the file, whoever is looking
into it is going to be (probably) wasting time looking for more differences
between linux and NetBSD.   The right way to deal with that kind of problem
is to look at the config, the manual for the config file (or program that
uses it) under NetBSD, and verify that each setting is set to the appropriate
value for NetBSD, and that no settings that should be being made have
simply been omitted.   Not only is this likely to find something that has
been set to a linux appropriate value, but will also allow any other config
values that are no longer appropriate to be updated at the same time
(much unrelated to which operating system is in use is likely to have
changed in 10 years or whatever).

  | `birthtime' is a human useful information, which may be supportive to
  | determine some features of the file and to better identify is origin. It's
  | important whether the file is a document, a configuration file, a photo,
  | or a song.

It could be.   But I'm still trying to find out what it is important for,
how it is actually used (productively) to make it all that important.

Once we find something with a genuine use, I will show (I believe) how that
cannot be implemented (whatever the use is - though the details of that
demonstration will vary depending upon what the use is).

But my general method will be to set up 2 scenarios, with 2 files, and
perform identical operations on them, where in one case your use requires
one outcome for the birthtime for one of the files, and a different outcome
for the other.   As soon as that is demonstrated, it is shown that the
proposed use is unimplementable by the filesystem as it cannot guess which
of the two cases you intended (you might be able to, which is why some kind
of meta-data in the file itself can sometimes work where filesystem meta-data
does not).

But I cannot do that without a clear demonstrated use, with the actual
requirements (what the data must mean) precisely stated.   Further, without
that nor can we produce a definition of the field, and without that, it
is also impossible to (rationally) implement.

Finally, once again, I do not claim that knowing when a file (or data
in a file) was originally created is necessarily useless - just that no
such use can be correctly implemented by the filesystem (that is, a
birthtime field in the inode is useless).    That's why when someone
comes up with a use for the field, I'm prepared to spend some time to
show how the filesystem cannot implement that use.

To do that I need to be told the exact properties that you require of
the data - what it represents, what sets it initially (and to what value),
and what, if anything is allowed to change it, etc.   All of that information
should be simple to extract from the intended use.

And note, that rather than showing that the proposed birthtime use cannot
be implemented, I might sometimes show instead that the statement of
operations (how the data is created and changed, etc) cannot satisfy your
stated use case.

You do not need to explain how to represent time, or what time really means;
while time is an endlessly fascinating subject, much more complex than most
people realise, those considerations will not be relevant here (any more
than they are to atime, ctime and mtime, all of which are useful).

kre

ps: if anyone actually believes that the current definition (and
implementation) of the birthtime field is useful for anything, now
would be a good time to speak up.




Home | Main Index | Thread Index | Old Index