NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Ideas for stripping tags from document



HEY Johnny, that thing with  tr -d did not work. When I read the
manpage I got and idea:
character classes (in this case [:cntrl;]). It turns out that one can do

s/[[:cntrl]]/\n/g

using PERL. That fixed the prob with \x{d}. I still need to fix \x{92}
, \x{93}, etc

It would be nice to do: system(tr -d .... $text). Then write the
result to filehandle.
Where do you get the octal vals for \x{92} , \x{93} , etc ?

On Sun, Jan 17, 2021 at 5:35 AM Johnny Billquist <bqt%update.uu.se@localhost> wrote:
>
> On 2021-01-17 10:57, Ignatios Souvatzis (GSG) wrote:
> >
> >
> > Am 17. Januar 2021 00:01:23 MEZ schrieb Johnny Billquist <bqt%update.uu.se@localhost>:
> >> On 2021-01-16 19:45, Todd Gruhn wrote:
> >>> I have a large document (18,000L). It is full of tags such as <93>
> >>> ,<94> , <95> .
> >>>
> >>> If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ...
> >>>
> >>> Is there a pkg or command to strip these tags and leave the text ?
> >>
> >> tr -d "\223\224\225" < infile > outfile
> >>
> > I,d convert them to ", ",and maybe *, if you really want pure ASCII, but yes.
>
> Well, he did ask how to strip them.
>
> But sure, tr can be used for replacing them with other characters as
> well, obviously. Trivial, in fact.
>
>    Johnny
>
> --
> Johnny Billquist                  || "I'm on a bus
>                                    ||  on a psychedelic trip
> email: bqt%softjar.se@localhost             ||  Reading murder books
> pdp is alive!                     ||  tryin' to stay hip" - B. Idol


Home | Main Index | Thread Index | Old Index