Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: rtVAX300 .. need help..



On 10/01/13 12:14 AM, Mouse wrote:
Snagged.  Thank you for the pointer.  Even without OCR, this is
going to be an interesting read.

I've never seen a document improved by OCR.  Seen plenty ruined
beyond salvation.  They should outlaw it.

The older a book is, the better typeset it is, and scanning it is the
best outcome.

Just imho as a typographer, of course.

You can't grep images.

Normally, my first operation on PDFs is to convert them to text files,
which text files I then use.  The VARM is a prime candidate for
conversion to text, because even its diagrams are done as text (at
least in my VARM; even in the one bqt pointed at, the only things I've
seen so far that aren't text are the |d|i|g|i|t|a|l| logos - and the
large text on the cover, if you count that).

I would never suggest throwing away the original scans.  But I would
suggest converting them to text anyway, for the sake of searching and
use in non-GUI interfaces.

Yes, that's a good idea, but I see too many books just OCR'd without any originals, rendering them all but worthless.

--Toby


/~\ The ASCII                             Mouse
\ / Ribbon Campaign
  X  Against HTML               mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B




Home | Main Index | Thread Index | Old Index