go-ocr is a tool for extracting plain text from scanned documents in pdf or djvu formats, and postprocessing of the text using user-defined rewriting rules to remove OCR artefacts and irregularities.
Homepage
Download
Recent Releases
0.4.203 Aug 2016 03:15
minor feature:
in the call to os.OpenFile().
Error handling rationalised.
Version increment.
0.4.125 Jul 2016 03:15
minor feature:
Added check for older versions of pdfimages.
Version increment.
0.4.018 Jul 2016 19:19
major feature:
Major changes:
- Added support for djvu files;
- Project renamed to go-ocr
|