NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/59803: sed(1) conditional branch command confuses subsequent line addressing



The following reply was made to PR bin/59803; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: RVP <rvp%SDF.ORG@localhost>
Cc: gnats-bugs%netbsd.org@localhost
Subject: Re: bin/59803: sed(1) conditional branch command confuses subsequent line addressing
Date: Mon, 01 Dec 2025 09:17:38 +0700

     Date:        Mon, 1 Dec 2025 00:10:12 +0000 (UTC)
     From:        RVP <rvp%SDF.ORG@localhost>
     Message-ID:  <71936a11-411a-9ac3-c4cf-e5aef68f3393%SDF.ORG@localhost>
 
   | but, there's a major difference between these two: even in a stream,
   | sed can _always_ retrieve the current line number
 
 Sure, but that's not the point.   The point is what a dual address
 command means, and it isn't the same as what you're imagining.
 
 It is easy to be seduced when the command that is to be executed is
 something simple, like 'p' 'd' or 's', but it isn't always.
 
 Consider the case where what is happening is that extracts from the
 text are being accumulated in the hold space from a range of input
 lines - when the first line of the range is encountered, things
 are initialised (the hold space is cleared, or whatever is needed),
 and when the final line is encountered, the hold space is used in
 whatever fashion is intended.
 
 If you never actually encounter the first line, the init is going to
 be skipped, and if the commands are executed on following lines, what
 will result will be a mess.
 
 The same applies to the end line of the range - if that one isn't
 encountered, the commands simply keep on being applied - nothing has
 caused them to stop.   I suspect that your two line patch didn't handle
 that case, I also suspect that handling it would be a little more complex.
 
 But if in the OP's example the "3,$" were instead "3,5" (with the input
 containing more than 5 lines), and it happened to be that line 5 was the
 one where the substitute occurred, and that 't' causes the dual address
 command to be skipped - then what happens is that that range remains active
 and will apply to lines 6 7 8 ... continuing until line 5 is actually processed
 (which is unlikely in that scenario!)
 
 It isn't generally difficult to write sed scripts that handle all this kind
 of thing properly (which often means not using explicit line numbers, other
 than perhaps 1 and $) provided that one understands how sed is defined to
 work - and dual address commands are not defined to be "any line that happens
 to be at or after the first address and at or before the second address",
 they are "start (only) when the first address is found", and "stop when the
 second address is found (and only then)".
 
 Don't fall into the trap of "it just seems obvious that it should ..." and
 change the behaviour of commands without a careful analysis of why they
 are the way they are.
 
 kre
 
 


Home | Main Index | Thread Index | Old Index