Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/bin/sleep



    Date:        Fri, 25 Jan 2019 14:04:07 +0300
    From:        Valery Ushakov <uwe%stderr.spb.ru@localhost>
    Message-ID:  <20190125110407.GE18200%pony.stderr.spb.ru@localhost>

  | I don't understand why the locale support in that
  | particular place is not ripped out immediately when discovered.

Because it has been there a very long time, and no-one has
complained about it, and I have no idea to what extent it might
be being actively used.  Do you?

Also because it was quite clearly done deliberately ... there was
a PR, in 1997 (PR#3914), which requested support for non-integral
numbers of seconds.  The PR supplied code to implement it, which
(aside from ugly formatting) looks as if it would have been just fine, 
and which most certainly handled only '.' as the radix char.   The code
supplied apparently came from OpenBSD (whether the submitter of
the PR wrote it for OpenBSD, or just took it from there is not clear,
but I think probably the former.)

The functionality was implemented, but with totally different code,
precisely to make locale specific input work (all of this is in the
CVS logs, the PR, and the comments in the code).  While I was
using NetBSD (to an extent) at that time, I certainly was not paying
any attention to changes like that.   But the time to complain about
it would have been then, or soon after.

  | If the problem described in the original report is not a gross and
  | cynical violation of POLA I don't know what is.

No, I agree, which is why I made the change that makes it
always attempt to use the C locale if there is a parse error
converting the striing (if you just say "2" it makes no difference...)

The way it is done now is not very nice, and I do plan on
changing that, the new version is much nicer ... but the
functionality is unaltered.

  | I have posted about this to the original thread on netbsd-users@ when
  | the issue came up, before any changes were made.

That I did not see ... but netbsd-users wouldn't be the correct forum
either.   I'd suggest either current-users or tech-userlevel, with a
Subject that makes it obvious that relevant concerned people should
take a look - not a reply to some other random message.

I don't much care about the outcome, as you have suggested,
locales don't have a lot of influence on me, even though I do
live in a non-ascii country (just as non-ascii as yours, perhaps
moreso) and I certainly make no claims to knowing anything
much about locales - I would never have deliberately added
code to make sleep handle non-C locales, but it could have
easily happened by accident.

I mean, who'd guess that strtod() would parse numbers in a
locale specific way?    (Sleep did not use that, it used atof(),
but atof() is just strtod() with error checking turned off....)

How many other utilities do we have that have the same issue?

  | The other mail that made it to the list was about an openwindows
  | program in sunos (mail? i don't remember) that accidentally generated
  | PostScript with locale specific floating point numbers.  As you can
  | imagine PS interpreter didn't know how to interpret 0,1 0,1 rmoveto

That message I did see.   It does not qualify as anything which
would prompt a reasoned discussion on how sleep should work.

What's more, parsing input can allow more flexibility than is possible
in generating output, for input we can allow either, for output the
code needs to select one or the other.

As a semi-irrelevant aside, did you know that sleep can also accept
is "seconds" argument in hex ("sleep 0xA" is the same as "sleep 10")?
Unlike fractional seconds, that one is not documented -- but there is
(and has been for years) an ATF test to make sure that it continues
to work.    This is all just fallout from the use of atof() (aka strtod()).
Those parse hex, so sleep does as well...   But someone thought
it was important enough to actually validate that it works correctly.
(I added a couple more tests, testing fractional hex input, the other
day, but the orignal hex test has been there for years.)

  | This sleep fiasco is up there with that story.

Fiasco?    What fiasco?   The original report was about a user seeing
annoying messages when he restarted some rc.d daemon.   Aside from
the messages, everything was working - though perhaps the sleep
loop went around a few more times than anticipated - depends upon
whether or not printing the error message took more than 50ms.

This was one of the more innocuous issues imaginable really, no harm
was done.   Just an (initially unexplained) irritation.   Hardly a fiasco!

What's more, the only time it would ever have actually "failed", was with
a sleep duration < 1s (that is: sleep 0.x for some value of x).   Any other
usage (eg: sleep 2.5) would have (seemed to) work just fine (ie: no
annoying message, and no immediate exit from sleep) whatever the locale.

And last (and not really related) ...

  |  I try to avoid locale with the exception of LC_CTYPE.

Then maybe you can help ... in sh, I have implemented (a year or
so ago, I forget when) the coming new POSIX quoting format $'...'
(which of course came from some other shell originally, and is now
implemented by just about all shells I believe) which is supposed to
implement something approximating C "" strings (with all the \
escape sequences in those, plus a few more) but which is
otherwise identical to sh '...' quoting.

One of the (not sure if this is in C or not, but I suspect it is) escape
sequences (well, 2 of them) is \uXXXX  (the other is \Uxxxxxxxx -
which are the same except for the number of hex digits that follow
the 'u' (up to 4) or 'U' (up to 8)).

The intent is to allow the script (the user) to enter any unicode
code point, reliably, into the sh input (as an arg for a command,
to assign to a variable, or anything else an arbitrary string can
be used for in sh.)

All of this so far is easy.   The current implementation handles
that, and generates a UTF-8 string into the word that is being
produced, where the UTF-8 string is the encoding of the code
point.   That's easy too, and I believe it works.   But it is wrong.

What should be generated is the byte sequence that represents
the code point identified, in the user's current locale (LC_CTYPE)
(or, I think, '?' or something if there is no way to achieve that.)
Or course, if the locale uses UTF-8 encoding, as most do, or can,
all is fine, or should be (as far as ignorant ivory tower dwelling
me can tell)

I have no idea how to make the correct output happen, coding
locales (as you quite correctly suggested) is not my thing.  I
assume some iconv() invocation or something is needed.
So I just punted...   (the man page says so!)

If you'd like to help and supply the missing code piece, I can
point you to exactly where in the sh sources it should be added.

Same offer to anyone else who could help...

There are other places where sh could really do with some
LC_CTYPE locale type handling done properly, as well
(currently it has essentially none.)    Pattern matching (glob
'*' type patterns, for file names, case statements, and
substring matching (${var%string} etc) should all work with
non-ascii chars.   They don't currently.   There is more
(but I doubt that I cat even guess at the extent of what really
ought to be done.)   And all that is apart from the simpler
(I think) issue of generating (appropriate) messages (error
messages mostly, sh does not say much else) in the user's
language.   (Simpler conceptually, I think, but plenty of
translation work needed.)

kre



Home | Main Index | Thread Index | Old Index