pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Please suggest OCR software



On Wed, Apr 03, 2019 at 04:19:05PM +0200, Joerg Sonnenberger wrote:
> On Wed, Apr 03, 2019 at 02:24:20PM +0200, Adam wrote:
> > Tesseract is quite nice and easy to use. It also has an easy API. The package is large, because it contains data for many languages.
> 
> Agreed, but don't forget to tell Tesseract the correct language, it
> makes a huge difference when it comes to recognition.

Thanks.

Trying to figure out an option to preserve the layout (or at least let it
not affect the OCR results). The scan contains tabular data with English
text in the cells.

I think, the borders of the table are interfering with the text
recognition. For text outside the tables, the results are good though for
the text in the cells they aren't that good, though both use the same font
etc.

Mayuresh



Home | Main Index | Thread Index | Old Index