NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Ideas for stripping tags from document



Hi,

On Sat, Jan 16, 2021 at 01:45:45PM -0500, Todd Gruhn wrote:
> I have a large document (18,000L). It is full of tags such as <93>
> ,<94> , <95> .
> 
> If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ...

Ahem - are you sure (have you looked at as few of them with hexdump -C)?

Your perl editor displays \x{93}, your other editor <93>, in reality
they might be just one octet with that value.
Sounds like some windows-1252, where they're “, ” and • , respectively.

> Is there a pkg or command to strip these tags and leave the text ?

In that case I'd try

	iconv -f windows-1252 -t utf-8 < foo > bar

Regards,
	-is


Home | Main Index | Thread Index | Old Index