NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/57253: xargs wraps lines after ~4k characters
Date: Thu, 02 Mar 2023 16:23:26 +0100
From: Marc Daniel Fege <marc%fege.net@localhost>
Message-ID: <16452345.geO5KgaWL5@nb-marc-fege>
| Just for understandig: how would it break any compatibility pushing limits
| beyond than those set now?
It wouldn't, there are no maximums on those values.
| The only side effect I see is scripts written beyond the limits would
| now accidentally becomming functional. No harm here.
No, the side effect is that people write scripts which seem to work,
as they don't encounter the limit, but which then mysteriously fail
on a system with smaller limits, leading the script author to believe
that it must be that system at fault, and so submitting a bug report,
when the real problem is a script assuming something that is nowhere
promised to work (but just happens to). That is, bad user code.
| Meaning: minimum is not something that must not exceeded per se.
No, the system is allowed to be bigger. What they do mean is that you
(the application writer) are not allowed to assume that the limit will
be bigger than the minimum. If it happens to be, great, but for the code
to be correct, it needs to work in restricted environments as well as in
ones with larger limits.
| However, if POSIX strictly demands to set those limits you mentioned,
As you just pointed out, they are minimums. We are allowed to use bigger
values. You are not allowed to assume that we do. If you are using xargs
for a sensible reason, then you don't need to worry about it, just use it
correctly. The command will be invoked as many times as are required, and
it is xargs job to work all that out, so you don't need to bother. That is
its purpose.
It makes no sense to use xargs, and then demand that it run everything in
one command invocation, if you want that, just don't use xargs at all, and
simply run the command.
If the command is to be "echo" (or printf, which is a safer, more portable,
alternative) then you get the benefit that the shell never actually runs any
command at all, just does the output itself, in which case ARG_MAX is
irrelevant, all that matters is how much memory the shell can malloc() for
the data to be written (often to exist twice, once as the value of
some variable, then copied to the arg list of echo or printf).
| It is as we see really messy to write portable shell scripts
| even only with tools --may they POSIX compliant or not -- behave
| differently due to a botchy standard (e.g. 'wc').
While it might be correct to call POSIX a "botchy standard" in some
respects, none of this discussion (on xargs, or wc) is part of the problem.
If you carefully use only what POSIX promises will work, and no more, then
you should have few problems. If you do, then submit a bug report (but first
make sure that you are not assuming something not promised to be true).
In most cases those bugs get fixed - not always, as POSIX does have some
"issues" with what demands - some of it is absurd, and we have simply decided
to ignore it (you might never encounter one of those though).
The current problem is that you just don't like the things the way that
they're specified to work. That's fine, no-one says you have to like them,
you are free to use your own alternative programs if you like -- it is hard
to get around ARG_MAX however, so if that might be an issue, use a mechanism
other than command line args, to deal with it - like putting data in a file
(as rvp suggested) or passing it through a pipe.
| According to my testings, between
| openSUSE, FreeBSD, NetBSD and MacOS, NetBSD has the narrowest limits to
| process one big line on the shell, as was mentioned in this feed earliear.
That might be true, but ARG_MAX as it is currently defined anyway, is
an "all ports" constant - the same thing is used on 64 bit intel/AMD
processors, and on ancient atari, sun2 and vax processors. Simply making
it bigger is likely to break some of those systems.
| which is pitty not for me but for the people use NetBSD (as their daily
| driver).
It is actually a benefit. People who use netbsd are more likely to run
into the limit while testing, and so write correct code that will handle
that, rather than write rubbish, and then complain when it doesn't work
elsewhere.
| It is not xargs per se, but /bin/echo, which creates \n after
| a certain chunk of data, until the buffer is empty.
That's not exactly what happens (nor what rvp said happens). If you do
command_which_generates_lots_of_data | xargs echo
then xargs runs echo as many times as required to output all of the
data, passing as much as the system will allow each time - echo takes
whatever strings it is passed, and writes them, and then ends with a
newline, each time. That's what that command says to do. That is,
assuming that something like that is what the problem is/was that is
what you told it to do. That may have not been what you meant, but
it is what you instructed.
If you used instead
command_which_generates_lots_of_data | xargs echo -n ''; echo
then the -n would cause echo to suppress the newline at the end of
each invocation. Then you need the '' arg added, to make sure a space
is inserted between the last arg output from one invocation of echo and
the next (you'll also get one at the start). The final echo which is
run after the pipeline finishes, just writes a newline (no args to write
and no -n to suppress the newline).
Better to use printf though (IMO), as it should work everywhere (some
systems use a different version of echo, which doesn't have a -n arg,
but uses a different method to suppress the newline, harder to generate
with xargs, and can also interpret the content of the strings, and not
simply write them).
command_which_generates_lots_of_data | xargs printf ' %s' ; printf '\n'
would work, and in this one you can decide if you'd prefer a leading
space (as shown, like the echo example would give) or a trailing one,
by using '%s ' instead of ' %s'.
But assuming you're not generating truly huge strings (10MB should be
no problem on most 64 bit systems) you can just use
printf '%s\n' "$(command_which_generates_lots_of_data)"
instead. (Or if you insist on echo, use that instead, without -n).
None not /bin/echo or /usr/bin/printf - you need the version built
into the shell, if you ran an external command, ARG_MAX applies.
| Please contemplate my invokes about "wc" and this issue here once again.
If someone wants to change wc so that it produces no spaces when run as
wc -[lwc] < file
that's fine, but I see no need to do that - and you cannot depend upon
that working, spaces are permitted, so your script must allow for that,
otherwise it has a bug.
I doubt we're going to change ARG_MAX anytime soon - it is easily big
enough for any rational program (64 times as big as it is required to be)
while not being so big that it becomes impossible to implement on smaller
machines).
If you want a bigger one for a system of yours, you can just change it in
src/sys/syslimits.h and then rebuild and reinstall the entire system.
No guarantees that things will work if you make it too big however.
But you really should embrace the opportunity to deal with the hidden
(invalid) assumptions in your code, and fix them. Long term that will
serve you better. "Works on ..." does not mean "correct".
You reported no actual bugs here, other than ones in your script(s?),
so these PRs will remain closed.
kre
Home |
Main Index |
Thread Index |
Old Index