NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/57253: xargs wraps lines after ~4k characters



The following reply was made to PR bin/57253; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: Marc Daniel Fege <marc%fege.net@localhost>
Cc: gnats-bugs%netbsd.org@localhost, rvp%sdf.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: bin/57253: xargs wraps lines after ~4k characters
Date: Thu, 02 Mar 2023 23:56:30 +0700

     Date:        Thu, 02 Mar 2023 16:23:26 +0100
     From:        Marc Daniel Fege <marc=40fege.net>
     Message-ID:  <16452345.geO5KgaWL5=40nb-marc-fege>
 
   =7C Just for understandig: how would it break any compatibility pushing=
  limits=20
   =7C beyond than those set now?
 
 It wouldn't, there are no maximums on those values.
 
   =7C The only side effect I see is scripts written beyond the limits wou=
 ld
   =7C now accidentally becomming functional. No harm here.
 
 No, the side effect is that people write scripts which seem to work,
 as they don't encounter the limit, but which then mysteriously fail
 on a system with smaller limits, leading the script author to believe
 that it must be that system at fault, and so submitting a bug report,
 when the real problem is a script assuming something that is nowhere
 promised to work (but just happens to).   That is, bad user code.
 
   =7C Meaning: minimum is not something that must not exceeded per se.
 
 No, the system is allowed to be bigger.   What they do mean is that you
 (the application writer) are not allowed to assume that the limit will
 be bigger than the minimum.   If it happens to be, great, but for the cod=
 e
 to be correct, it needs to work in restricted environments as well as in
 ones with larger limits.
 
   =7C However, if POSIX strictly demands to set those limits you mentione=
 d,
 
 As you just pointed out, they are minimums.   We are allowed to use bigge=
 r
 values.   You are not allowed to assume that we do.   If you are using xa=
 rgs
 for a sensible reason, then you don't need to worry about it, just use it=
 
 correctly.   The command will be invoked as many times as are required, a=
 nd
 it is xargs job to work all that out, so you don't need to bother.  That =
 is
 its purpose.
 
 It makes no sense to use xargs, and then demand that it run everything in=
 
 one command invocation, if you want that, just don't use xargs at all, an=
 d
 simply run the command.
 
 If the command is to be =22echo=22 (or printf, which is a safer, more por=
 table,
 alternative) then you get the benefit that the shell never actually runs =
 any
 command at all, just does the output itself, in which case ARG_MAX is
 irrelevant, all that matters is how much memory the shell can malloc() fo=
 r
 the data to be written (often to exist twice, once as the value of
 some variable, then copied to the arg list of echo or printf).
 
   =7C It is as we see really messy to write portable shell scripts=20
   =7C even only with tools --may they POSIX compliant or not -- behave
   =7C differently due to a botchy standard (e.g. 'wc').
 
 While it might be correct to call POSIX a =22botchy standard=22 in some
 respects, none of this discussion (on xargs, or wc) is part of the proble=
 m.
 
 If you carefully use only what POSIX promises will work, and no more, the=
 n
 you should have few problems.  If you do, then submit a bug report (but f=
 irst
 make sure that you are not assuming something not promised to be true).
 In most cases those bugs get fixed - not always, as POSIX does have some
 =22issues=22 with what demands - some of it is absurd, and we have simply=
  decided
 to ignore it (you might never encounter one of those though).
 
 The current problem is that you just don't like the things the way that
 they're specified to work.   That's fine, no-one says you have to like th=
 em,
 you are free to use your own alternative programs if you like -- it is ha=
 rd
 to get around ARG_MAX however, so if that might be an issue, use a mechan=
 ism
 other than command line args, to deal with it - like putting data in a fi=
 le
 (as rvp suggested) or passing it through a pipe.
 
   =7C According to my testings, between=20
   =7C openSUSE, FreeBSD, NetBSD and MacOS, NetBSD has the narrowest limit=
 s to=20
   =7C process one big line on the shell, as was mentioned in this feed ea=
 rliear.
 
 That might be true, but ARG_MAX as it is currently defined anyway, is
 an =22all ports=22 constant - the same thing is used on 64 bit intel/AMD
 processors, and on ancient atari, sun2 and vax processors.   Simply makin=
 g
 it bigger is likely to break some of those systems.
 
   =7C which is pitty not for me but for the people use NetBSD (as their d=
 aily=20
   =7C driver).
 
 It is actually a benefit.  People who use netbsd are more likely to run
 into the limit while testing, and so write correct code that will handle
 that, rather than write rubbish, and then complain when it doesn't work
 elsewhere.
 
   =7C It is not xargs per se, but /bin/echo, which creates =5Cn after=20
   =7C a certain chunk of data, until the buffer is empty.
 
 That's not exactly what happens (nor what rvp said happens).  If you do
 
 	command_which_generates_lots_of_data =7C xargs echo
 
 then xargs runs echo as many times as required to output all of the
 data, passing as much as the system will allow each time - echo takes
 whatever strings it is passed, and writes them, and then ends with a
 newline, each time.   That's what that command says to do.  That is,
 assuming that something like that is what the problem is/was that is
 what you told it to do.   That may have not been what you meant, but
 it is what you instructed.
 
 If you used instead
 
 	command_which_generates_lots_of_data =7C xargs echo	-n ''; echo
 
 then the -n would cause echo to suppress the newline at the end of
 each invocation.   Then you need the '' arg added, to make sure a space
 is inserted between the last arg output from one invocation of echo and
 the next (you'll also get one at the start).   The final echo which is
 run after the pipeline finishes, just writes a newline (no args to write
 and no -n to suppress the newline).
 
 Better to use printf though (IMO), as it should work everywhere (some
 systems use a different version of echo, which doesn't have a -n arg,
 but uses a different method to suppress the newline, harder to generate
 with xargs, and can also interpret the content of the strings, and not
 simply write them).
 
 	command_which_generates_lots_of_data =7C xargs printf ' %s' ; printf '=
 =5Cn'
 
 would work, and in this one you can decide if you'd prefer a leading
 space (as shown, like the echo example would give) or a trailing one,
 by using '%s ' instead of ' %s'.
 
 But assuming you're not generating truly huge strings (10MB should be
 no problem on most 64 bit systems) you can just use
 
 	printf '%s=5Cn' =22=24(command_which_generates_lots_of_data)=22
 
 instead.   (Or if you insist on echo, use that instead, without -n).
 None not /bin/echo or /usr/bin/printf - you need the version built
 into the shell, if you ran an external command, ARG_MAX applies.
 
   =7C Please contemplate my invokes about =22wc=22 and this issue here on=
 ce again.
 
 If someone wants to change wc so that it produces no spaces when run as
 
 	wc -=5Blwc=5D < file
 
 that's fine, but I see no need to do that - and you cannot depend upon
 that working, spaces are permitted, so your script must allow for that,
 otherwise it has a bug.
 
 I doubt we're going to change ARG_MAX anytime soon - it is easily big
 enough for any rational program (64 times as big as it is required to be)=
 
 while not being so big that it becomes impossible to implement on smaller=
 
 machines).
 
 If you want a bigger one for a system of yours, you can just change it in=
 
 src/sys/syslimits.h  and then rebuild and reinstall the entire system.
 No guarantees that things will work if you make it too big however.
 
 But you really should embrace the opportunity to deal with the hidden
 (invalid) assumptions in your code, and fix them.   Long term that will
 serve you better.   =22Works on ...=22 does not mean =22correct=22.
 
 You reported no actual bugs here, other than ones in your script(s?),
 so these PRs will remain closed.
 
 kre
 


Home | Main Index | Thread Index | Old Index