NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Ideas for stripping tags from document



thanks for the idea, Ignatios.

I will try this.

On Sat, Jan 16, 2021 at 3:00 PM <ignatios%cs.uni-bonn.de@localhost> wrote:
>
> Hi,
>
> On Sat, Jan 16, 2021 at 01:45:45PM -0500, Todd Gruhn wrote:
> > I have a large document (18,000L). It is full of tags such as <93>
> > ,<94> , <95> .
> >
> > If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ...
>
> Ahem - are you sure (have you looked at as few of them with hexdump -C)?
>
> Your perl editor displays \x{93}, your other editor <93>, in reality
> they might be just one octet with that value.
> Sounds like some windows-1252, where they're “, ” and • , respectively.
>
> > Is there a pkg or command to strip these tags and leave the text ?
>
> In that case I'd try
>
>         iconv -f windows-1252 -t utf-8 < foo > bar
>
> Regards,
>         -is


Home | Main Index | Thread Index | Old Index