NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/51171: sed does not match newlines in regexps properly



The following reply was made to PR bin/51171; it has been noted by GNATS.

From: Jarle Greipsland <jarle%uninett.no@localhost>
To: gnats-bugs%NetBSD.org@localhost, kre%munnari.OZ.AU@localhost
Cc: 
Subject: Re: bin/51171: sed does not match newlines in regexps properly
Date: Fri, 27 May 2016 12:05:18 +0200 (CEST)

 Robert Elz <kre%munnari.OZ.AU@localhost> writes:
 >      From:        jarle%uninett.no@localhost
 >      Message-ID:  <20160527074000.50F8B7AABE%mollari.NetBSD.org@localhost>
   
 >    | -------- script.sed ---------
 >    | 1{h;d;}
 >    | 2{H;d;}
 >    | 3{H
 >    |   x
 >    | # Pattern space: line1 \n line2 \n \line3 (without spaces)
 >    | # Now, delete the first character of line1 and line2
 >    |   s/^[^\n]\([^\n]*\n\)[^\n]/\1/
 >    | }
 >    | -----------------------------
 >    | 
 >    | On NetBSD 6, the command
 >    |   (echo abc; echo def; echo ghi) | sed -f script.sed
 >    | will print:
 >    | bc
 >    | ef
 >    | ghi
 >    | which is what I would expect.
 >  
 >  If it does it is a bug the expression [^\n] matches a character
 >  that is neither a '\' nor an 'n' and has nothing at all to do with newlines.
 >  No escape characters work inside [] (though there a whole set of
 >  magic combinations that mean specific things).
 You are right.  I shall have to adjust my expectations.  And
 someone might want to adjust sed's behavior in NetBSD 6.  And GNU
 sed also, it would seem.  Oh well.  Lesson learned: don't rely on
 the behavior of \n in brackets.
 
 This problem report should probably be closed.
 
 >  As best I can tell (having looked for it for ages) there is no way in
 >  sed to match anything other than a newline.   I resorted to s/\n/X/
 >  where X was a character I knew could not appear in the text (because
 >  earlier commands had removed all instances), followed by [^X] in the
 >  expression to do the work, followed by s/X/${nl}/ (${nl} is a literal
 >  newline.   Truly ugly, but I believe the only way possible.
 Or even uglier, one could try and do dummy \n->\n substituions
 for positions where one does not wish a \n to match, and use
 control flow to branch to the appropriate substitutions.
 
 >  The best solution I can think of is to add a new char class that contains
 >  just newline, say [:nl:] and then use [^[:nl:]] but no sed does anything
 >  like that that I am aware of.
 That would have been nice, yes.
 					-jarle
 --
 we all hack on a broken subroutine, a broken subroutine, a broken subroutine...
 					-- Kenneth Stailey
 


Home | Main Index | Thread Index | Old Index