NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
lib/50143: various bugs/weirdness with parsedate(3) (libutil) (+ some suggested fixes)
>Number: 50143
>Category: lib
>Synopsis: various bugs/weirdness with parsedate(3) (libutil) (+ some suggested fixes)
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Aug 11 19:20:00 +0000 2015
>Originator: Robert Elz
>Release: NetBSD 7 & -current (2015-08-11) (NetBSD 6 has different bugs)
>Organization:
Prince of Songkla University
>Environment:
This is where I'm running send-pr and is not material...
System: NetBSD perseus.noi.kre.to 6.99.17 NetBSD 6.99.17 (GENERIC) #1: Fri Feb 22 22:09:50 ICT 2013 kre%jade.coe.psu.ac.th@localhost:/usr/obj/current/i386/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
Sorry, this is going to be a long PR - I thought it better to
send one PR containing all of the problems with parsedate(3) I've
noticed today, rather than 4 or 5 (or whatever) for this one
relatively obscure libutil function.
If you don't care about the grizzly internals of time related
functions, you should probably just delete any e-mail about
this PR (including this message). That probably leaves about
3 or 4 of us who care...
This issue is not totally irrelevant only because parsedate() is
used to implement date(1)'s -d option (which is where I started).
I had (have) a need to use date -d in a script, and wanted to
know what format in which I should generate the date for it to
parse. date(1) just refers to parsedate(3) - but unfortunately
that man page is not very useful - it just handwaves over what
formats are accepted, and what they mean (aside from now being
somewhat out of date - more on that later).
So, on to the source. While looking at the yacc input to work
out a good format for me to use (the one I picked will be
obvious in the examples below) I first noticed ...
#define HOUR(x) ((time_t)(x) * 60)
which looks OK, but I wondered who would want hours converted to
minutes, rather than seconds which is the normal basic time unit,
so
grep 'HOUR(' parsedate.y
and found stuff like ...
{ "est", tZONE, HOUR( 5) }, /* Eastern Standard */
{ "edt", tDAYZONE, HOUR( 5) }, /* Eastern Daylight */
which was all fine, well sort of ... it could use HOUR(4) for
"edt" which would make the offset explicit, rather than just
setting tDAYZONE rather than tZONE and then applying a one hour
correction (by magic) later.
But ignore that, I also saw ...
{ "cast", tZONE, -HOUR(9.5) },/* Central Australian Standard */
{ "cadt", tDAYZONE, -HOUR(9.5) },/* Central Australian Daylight */
for which that macro (HOUR) will obviously fail miserably (that is,
it would convert 9.5 to a time_t, rounding it up to 10, (or perhaps
truncating it to 9, doesn't matter) and then multiply by 60 to get
minutes. Bogus.
At the time (grep being what it is) I didn't know that all of the
HOUR(N.5) lines in parsedate.y were #if 0'd out, making the issue
with HOUR() really moot ... but as it was trivial to fix ...
#define HOUR(x) ((time_t)((x) * 60))
I just did that (multiply first, as a float if x is N.5 otherwise int.)
As long as only simple fractions of an hour are picked, and as only .5
is currently used, it is fine (.25, .1, .333, ... would also be OK).
[Aside, there is no issue at all with overflow here,
whether x is int or float - even 16 bit int arithmetic
won't overflow till x is bigger than 500 - and the
range of x is (at the outside, approx) [0..25] - it is
also all compile time constants, no user input to worry about.]
[Aside 2: this issue, such as it is, has been there
since parsedate() was added to NetBSD in 2006, including
the 3 days it was there as getdate() - I did not attempt
to go trawling back through usenet sources to see just
where this really originated.]
Then I went to test it ... and found that cast/cadt didn't work
at all (that's when I found the #if 0 stuff). Simple to
remove the relevant #if 0, and test again. "cast" worked fine,
"cadt" produced nonsense. That made no sense to me (I won't bother
showing the results, as to reproduce them requires more work than
is reasonable) so I tested again with one of the other standard
and summer time pairs...
date -u -d '2001-01-01 00:00:00 est'
Mon Jan 1 05:00:00 UTC 2001
That's fine (just as "cast" had been fine -- its 9:30 offset
worked properly with the corrected HOUR()).
But ...
date -u -d '2001-01-01 00:00:00 edt'
Thu Jan 1 04:59:59 UTC 1970
That's nonsense. Obviously. Try it!
The underlying cause of that problem is this ...
/* We rely on mktime_z(NULL, ...) working in UTC */
result = mktime_z(NULL, &tm);
result += Timezone * 60;
in Convert() in parsedate.y. There is one obvious problem with
that, which explains the gibberish output - the result of mktime_z()
isn't tested for errors. When result is -1 we get one second (-1)
before the time indicated by the timezone, hence 4:59:59 for edt
where est is 5:00:00... (you'd get something different if you don't
use -u and have date output in your local time zone.) The Jan 1 1970
comes as the rest of the input (which would have been included in
a non -1 result) has been lost. You get 4:59:59 Jan 1, 1970,
(with -u, or in a UTC zone) regardless of the input, whenever "edt"
is included (and similarly constant output in any other zone).
Adding
if (result == -1) /* possibly add errno test */
return result;
allows Convert() to signal an error to its caller, rather than
just returning rubbish. (Wait, don't do that yet!)
The actual error is caused by the immediately preceding (executed)
code ...
switch (DSTmode) {
case DSTon: tm.tm_isdst = 1; break;
case DSToff: tm.tm_isdst = 0; break;
default: tm.tm_isdst = -1; break;
}
That all looks fine, and in one case, i.e. this one...
if (Timezone == USE_LOCAL_TIME) {
result = mktime(&tm);
} else {
it is fine, but we're in the "else" case, and it isn't.
Note the comment (quoted above) and here again ...
/* We rely on mktime_z(NULL, ...) working in UTC */
That's OK, mktime_z(NULL, ...) does work in UTC, but in UTC, it
makes absolutely no sense at all to have tm.tm_isdst = 1;
That's impossible, UTC has no summer time, it is always standard
time, which is why "est" worked, and "edt" dind't, the former
is DSToff, the latter DSTon. mktime_z() (and mktime()) checks
that the resulting tm_isdst is compatible with the one passed in,
and returns an error if the time converted doesn't have the
summer time property that tm_isdst requests (if it requests one
at all.)
Now at first glance it seems as if just returning an error when
mktime_z() fails is reasonable, and for most failure modes it
would be. Here it even looks OK, after all, Jan 1 is not
summer time (daylight saving) in the US Eastern time zone, so
asking to convert "1 Jan, 2001 EDT" is erroneous, and giving
an error response would be reasonable.
Unfortunately ...
date -u -d '2001-07-01 00:00:00 edt'
Thu Jan 1 04:59:59 UTC 1970
that's also nonsense, but July 1, 2001, was summer time in
Eastern US (or most of it) so that conversion should not fail.
Of course, mktime_z(NULL,...) (as it works in UTC) is always
going to fail for any summer time conversion, regardless of
what would be appropriate in the timezone selected.
What should happen here is anyone's guess, the parsedate(3) man
page being particularly useless, it gives a list of timezone
abbreviations, and says they work (or implies they do, or that
they mean something) - including both est & edt - but doesn't
give any clue at all what they actually mean when present.
For my case, I chose to fix it by always calling mktime_z()
with tm_isdst == 0, so the only reason for it to fail is if
the data is out of range. If that happens, I just return
failure (best that can be done) but if mktime_z() succeeds,
and DSTmode == DSTon I just correct for summer time, by
brute force subtracting an hour (subtract, as we're manipulating
UTC, not the local time - an hour advance for local time is
equivalent to UTC being an hour earlier.) That's not good,
as it allows 2001-01-01 EDT without error (but perhaps it should,
who knows?), and because it assumes that summer time is always
one hour ahead of standard time - the localtime() and mktime()
libc functions go to extraordinary measures to avoid assuming that.
I'm not including a patch, as the changes are mostlty trivial once
you know what it is you really want to accomplish, the harder
part is deciding what that is.
While I was debugging this, I came across another problem.
The code attempts to use errno tests to distinguish between
the case where -1 just happens to mean Dec 31 1969 23:59:59 UTC
and the case where it indicates an error.
Unfortunately, it does that very poorly (well, it really cannot
be other than poor, as using errno this way is a horrible botch,
but it does it even worse than it needs to.)
The start of parsedate() (omitting the local var declarations) is ...
time_t
parsedate(const char *p, const time_t *now, const int *zone)
{
saved_errno = errno;
errno = 0;
and then at the end ..
if (errno == 0)
errno = saved_errno;
return Start;
}
which looks like it should be doing the right thing, if there's no
error, leave errno at whatever it was on entry, otherwise return
the error code, so if Start (which regardless of its name, is
actually the answer) == -1 the caller can tell the difference
between Dec 31 1969, and "error occurred".
Unfortunately, that assumes that errno != 0 --> an error occurred,
which it doesn't actually mean. While watching this while debugging
the issue above, I always found errno != 0, in fact errno == ENOENT
which is an error code that isn't in parsedate.y at all - so it
couldn't have come from there. In my case, the ENOENT came from
the lack of /etc/malloc.conf (or whatever it is intended to be
called) on my system (it rarely exists on anyone's system I suspect).
When malloc() checks for it, errno gets set, and stays set.
To do the errno trick as well as it can be, the code needs
errno = 0;
sprinked through it, all over the place - every time something is
about to be called that might fail, clear errno first, so that
any earlier sys call failures, which didn't amount to errors
parsedate() cares about can be ignored. Then if a function does
fail, and errno has become set, assume that that errno relates to
the failure (in these functions it probably does, as it is always
explicitly set) and return it. If the function didn't fail, then
errno might be anything, and must be cleared again before this
trick is safe to employ around another function call.
I haven't (yet anyway) even attempted to fix that one.
While here, as indicated above, I was looking at parsedate(3),
and aside from how useless it is as documentation of the
function, there are a couple of other issues...
The BUGS section says ...
2 The parsedate() function cannot compute days before the unix epoch
(19700101).
which hasn't been true since Dec 2011, if I read the cvs log
correctly.
date -u -d '1918-11-11 11:00:00 +0100'
Mon Nov 11 10:00:00 UTC 1918
and also ...
3 The parsedate() function assumes years less than 0 mean - year, years
less than 70 mean 2000 + year, years less than 100 mean 1900 + year.
the second clause of which is only partly true now - it was
corrected in the examples earlier in the man page, but not here.
The man page is also lacking anything that could be counted as
useful information (aside from the calling sequence for parsedate()).
>How-To-Repeat:
date -u -d '2001-07-01 00:00:00 edt'
and by reading the man page in conjunction with the sources.
>Fix:
As I have no idea what is really meant to be implied by
'2001-07-01 00:00:00 edt'
it is hard to know what the correct fix really is. If it is
intended to convert the time as if TZ=America/New_York
was in place (actually assume US Eastern time) then I suspect
quite a lot of work is required, as the primitives all work
in either UTC (GMT) or the caller's local zone, it would not
be trivial to handle properly (that is, to correctly discover
whether or not it was summer time, in Eastern US, in July 2001,
when my local zone is somewhere else entirely.)
Note that even attempting that implies buying in to the hypothesis
that it is possible to extract some meaning from time zone
abbreviations - which there really isn't, A whole bunch of
plausible ones are #if 0'd out of the current sources - not
just the ones that have fractional hours, which don't work
without HOUR() corrected as above - but also ones which are
just ambiguous so the code just arbitrarily assumes one (the
man page says the abbreviations exist, but doesn't bother to
say what they refer to - it just assumes "you must know").
What's more (for example) the abbrevtations chosen by parsedate.y
aren't walways what other code uses ... the "cast" example I
used above is one such case. That's supposed to be
Central Australian Standard Time
but nothing else calls it that, it is either just
Central Standard Time (CST)
or
Australian Central Standard Time (ACST)
older versions of the tz database had the former, more
recent ones the latter, none have CAST. (The same is
true for the other Austr abbreviations that aren't #if 0'd
out - I didn't check to see which of the others in the
list actually agree with other users - the US ones
obvously do.)
So, one reasonable choice might be to just ditch the whole
"parse the timezone abbreviation" nonsense completely, and
require numeric -0500 -0400 +0930 +1030 zone indicators instead
when a time is to be converted that is neither local time, nor
UTC. They work in all (not even so-) recent versions of parsedate().
Of course, other than one obscure example in parsedate(3) these
(the numeric offsets) aren't mentioned at all (unlike the
alphabetic abbreviations) so suddenly requiring them might be
a bit much of a leap... (Also: it isn't clear if the one example
intended the -0500 it used to be US EST (probable, but not stated),
or somewhere in central/western Asia).
Fixing HOUR() and (perhaps) removing the #if 0's around the
half hour zones is an optional extra to all of this - bizarre
that that one is where I started looking!
The man page needs a total rewrite.
errno handling needs to be totally renovated - or even better,
just trashed. To do the latter, we probably need to replace
parsedate() with a new function with a new calling sequence.
I'd suggest something like (excuse any C syntax errors here,
this is just an example)
const char *
parsetime(struct timespec *result,
const char *datestr,
const time_t *time,
const int *tzoff)
which would allow times like 12:25:47.1027 which are currently
parsed to actually be returned, rather than simply ignoring the
fractional seconds (I don't think they're even rounded), and
also allow the function to return a string indicating what failed
when something goes wrong, rather than just "it failed" - a NULL
return would indicate success, non-NULL would be a pointer to
a const string which indicates what caused the problem.
Or return an enum value (or int) and provide a function to convert
that to a string (ala errno and strerror() but with more appropriate
values to the use case.)
The old parsedate() function could be a wrapper around that
for compatability with any 3rd party apps (NetBSD code,
including date, would all be fixed to use the new interface
of course) which happen to know this old interface and use it.
Whatever is done needs a pullup to NetBSD 7.
For NetBSD 5 & 6, best advice is probably just not to use
parsedate() or date -u at all - they are horribly broken in
all but the very simplest cases.
The worst of that is fixed in NetBSD 7 & current, but hasn't
been pulled up. Perhaps it could be - I doubt any of the
changes (other than 64 bit time_t's which doesn't directly
affect this code at all) would break anything.
NetBSD 4 (where this all first appeared) is broken too of course,
but that's beyond EOL and won't ever get improved.
Once again, apologies for the length of this. Good luck fixing it.
Home |
Main Index |
Thread Index |
Old Index