Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/bin/sleep



    Date:        Sat, 26 Jan 2019 21:49:51 +0100
    From:        Joerg Sonnenberger <joerg%bec.de@localhost>
    Message-ID:  <20190126204951.GA7782%britannica.bec.de@localhost>

  | No, the fragile refers to the problem that many locales use both "." and
  | "," in numbers.

Yes, like English...   I wasn't previously aware that '.' was ever used
as the grouping char, though I did believe that some locales use a
space for that purpose.

  | While the standards decided in their infinite wisdom
  | that grouping characters shouldn't be parsed in floating point context,
  | it is very confusing at least for casual users. "sleep 1.000" would be
  | perfectly sensible for a German user, but certainly not do what is
  | expected.

Do you have a solution for that which can actually be implemented?

While the original code I wrote to deal with the (not-PR'd) bug report
- the one you commented on originally from a week ago - was fragile in
this area, the current one is slightly less so I think.

If the arg can be parsed by strtod() in the user's locale, it will be.
If it cannot, but if it can be parsed in the C locale, it will be handled
that way instead.   If neither work it is an error.

This keeps the traditional behaviour of NetBSD sleep (with some error
checking added, anticipating dholland's PR 53910 before he submitted
it) while also allowing scripts to sanely (well, as sanely as them ever
using non-integral inputs is) use standard C floats as the input (as,
as you point out, a floating number is, rightly or wrongly, currently
only parsed as a string of digits and an optional single radix character,
no grouping chars allowed.)   [Aside: hex counts as "digits".]

The question of whether sleep (and perhaps other commands) ought
to parse their args in a locale specific manner is a different issue, and
one worthy of considerable discussion.   I have no strong opinion on
this as (as has been pointed out) it does not really affect me much.

  | Arguing about locale behavior based on OpenBSD doesn't work,
  | since they intentionally doesn't implement most of it anyway.

I do not think that was even the intent.   In order to determine the
"how should we parse the args" question, Christos suggested looking
at how other systems do it - which is valid data to have.   If we
collectively come up with some particularly compelling reason that
we should do it one way of the other, that we mostly agree on, then what
other systems do is largely irrelevant.   If there is no particularly good
reason to prefer one way or the other, or we cannot agree, some
developers/users prefer one, and others prefer the other, then at least
acting the same way as (most) other systems (which allow non-integer
args) might be enough to decide one way or the other.

  | Let's take a step back from the implementation details. I consider the
  | command line interface of a program part of the shell language universe.
  | Programming languages shouldn't change arbitrary based on locale
  | settings. Otherwise you get the VBA madness. That's very different from
  | the data being processed or messages used for interacting with the user.
  | Valery mentioned the Postscript example already.

That;s all a valid point, which really should be made in some discussion in
messages on a better list than source-changes-d (in messages with the
Subject header "Re: CVS commit: src/bin/sleep").   This is not where
someone from the fututure would expect to find a discussion on a
philisophical (or technical) reasons why we should decide one way
or the other, should they be wondering why we made the decision now
whichever way we end up making it, if 20 years (or more) into the
future this all comes up again.

But since this is here, one point I'd make is that there is no particular
distinguishing feature in the command line interface of a program, which
distinguishes it from data used with interacting with the user.    The
program does not know from where its args were obtained.   If they're
written in a script, then I absolutely agree with you, it ought to use the
"standard" notation (one way or another).

But I know that I frequently simply type

	sleep n

into my shell, and then follow that by a bunch of commands I want
executed a little later, and while it would be unusual for me to give
fractional seconds in such a case, if I did, I'd normally expect to
enter those in the same format I use for any other floating point number
in my day to day life, which is what I would have LC_NUMERIC
set up to produce and consume (that is, whatever I believe is best for
me, which is not necessarily the same as the guy at the next desk
in the same room ... obviously in the same country.)

Similarly, if a script requests a delay value from the user, as in the

	printf 'How long should the delay be? '
	read delay || exit 1
	sleep "$delay"

type example, which is that?   Data obtained while interacting with the
user, or the command line interface of a program?

This gets a bit messy, as while if we returned "sleep" to be the
way it was 2 weeks ago, only parsing using the user's locale,
scripts could handle that by simply doing

	sleep $(printf %g 1.234)

(assuming a correctly working printf, which we now have in
/usr/bin but not in the version built into /bin/sh (or csh I think)
which (for sh) is something I am working on.   I now kind of
understand the point of some locale related code in the FreeBSD
sh that I had been ignoring as "outside my area of expertise").

However, if we decide (which would be the more likely decision
I think) that sleep should only accept C locale floats, and never
parse using LC_NUMERIC from the environment, then what change
can we make to the 3 line printf/read/sleep sequence above to
make that work as expected?    As best as I can tell, there is no
really good way of converting numbers from one locale to another,
except for the special case where the input locale is C/POSIX.
In fact, I am having a very hard time thinking of any way that does
not involve writing new code, but perhaps you, or one of the other
people who deal with locale issues all the time (which is not me)
has a solution to this?

kre



Home | Main Index | Thread Index | Old Index