Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Weirdness in comm(1)



    Date:        Sat, 28 Nov 2009 18:59:39 +0100
    From:        Joerg Sonnenberger <joerg%britannica.bec.de@localhost>
    Message-ID:  <20091128175939.GA10476%britannica.bec.de@localhost>

  | Even for the specific case of comm you can't just strip off the
  | trailling newline. You have to remember if you had one as well and only
  | compare lines as equal if they either both or none of them has it.

I'd fix that by having the line getting function used return the number
of bytes consumed (not necessarily the strlen() of the returned line).
If the lines compare equal, then compare the lengths, and use that as
the result.

Fortunately, comm has no real issue with adding a \n after a last line
in the file if it has one - it must do that, or a later line from
the other file would be combined with that one - and if it happens for
the file that ends first, to be consistent it has to happen for the file
that ends second too.

Personally, I think that all text processing applications (ones intended
to process text files) should be defined to have undefined behaviour when
given a non-text file as input (including not being composed of a number of
lines (including 0) each of which ends in a \n.

Another issue comm (and other similar applications) has is in dealing
with lines with embedded \0's - it doesn't handle them properly now,
and I don't think it should be required to.

kre



Home | Main Index | Thread Index | Old Index