tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Bug in scanf?



On Wed, Dec 10, 2025 at 16:03:08 +0100, Martin Husemann wrote:

> The problem is that the syntax given for %f input makes 'e'{digit}+
> optional, but only together and the parser can only decide it is there
> or not when seeing the one char lookahead "e" input. So it decides the
> exponent expression is there and then fails to parse it (no digits
> following), which means scanf aborts due to a match failure and
> returns the number of expressions parsed so far (0 in this example).

That depends on how you organize your parser.  By the time you parsed
100 you get a successful match that satisfies the start symbol of your
grammar, anything after it is optional.  So you remember internally
this fact and keep on parsing.  "e" looks ok, so you parse further.
Now "r" is not a digit, so you attempt to parse the optional exponent
failed.  But.  That doesn't cancel the fact that the inital 100 has
been successfully parsed.  You can only ungetc one character, "r", so
you do that and return that 100 you saved earlier.

I guess that's how you get the glibc behavior that loses "e".
As far as I can tell that probably violates the part that says:

| The first character, if any, after the input item remains unread.


Two related question are

What should happen when you scanf "100ergs" as just "%f".

Pedantically speaking, that footnote about "scanf pushes back at most
one input character" talks about scanf invocation parsing input
according to the passed format string, not about scanf invocation
processing one single conversion specifier from the passed format
string.  So parsing "100ergs" as "%f%s" has the license (unlike "%f"
in isolation) to parse it as "100" + "ergs" without violating the
ungetc constraint.  Though, admittedly, I haven't studied the coplete
section to see if that reading is indeed permitted (but nothing in
epsilon neighborhood seems to directly contradict/prohibit it)

  $ echo '100e11rgs' | sed -E 's/([0-9]+)(e([0-9]+))?(.*)/\1-\3-\4/'
  100-11-rgs
  $ echo '100ergs' | sed -E 's/([0-9]+)(e([0-9]+))?(.*)/\1-\3-\4/'
  100--ergs

-uwe


Home | Main Index | Thread Index | Old Index