Recent Releases

5.5.029 Nov 2024 11:45 major bugfix: Set hOCR capabilities ocrp_dir and ocrp_lang unconditionally. Calculate row bounding box in single-word mode per. Reduce clock syscalls. Several small performance and other code. Modernized code. Print time for tessedit_timing_dein milliseconds. Print time for ErrorCounter::ComputeErrorRate in milliseconds. cmake: Correctly set the soversion based on SemVer properties. Do not export PDBs for static libraries. Several other small and improvements for builds and CI. Modernize code for renderers and remove filename conversion for Windows. Add build rule for Windows installer. Support symbolic values for --oem and --psm options. Remove Tensorflow support. Add RISC-V V support. Remove broken GitHub action msys2-4.1.1.
5.4.112 Jun 2024 06:25 minor bugfix: Avoid FP overflow in NormEvidenceOf. Small build and code improvements.
5.4.013 May 2024 09:25 major bugfix: -rc1. Build, code refactoring and other smaller changes. grey result of indexed PNG in pdfrenderer. Rename frk - deu_latf (ISO 639-3, ISO 15924). Remove broken Dockerfile. for several reported by Coverity Scan. Remove unsupported OpenCL code and related API functions. Facilitate vectorization for generic build. Add PAGE XML renderer / export. Support training without lstmf files. Improve CCUtil::main_setup. Allow for text angle/gradient to be retrieved.
5.3.418 Jan 2024 11:48 minor feature: * Fixes for scrollview * Fixes for autoconf, clang and sw builds * Improve OCR for an image URL * Fail on curl download errors * New parameter curl_cookiefile * Set User-Agent: header field in HTTP request for curl downloads * Output directory list from "combine_tessdata -d" to stdout * Other small improvements for code and documentation.
5.0.0-rc101 Nov 2021 07:05 minor feature: New release 5.0.0-rc1 Signed-off-by: Stefan Weil lt;sw@weilnetz.de gt;.
5.0.0-beta-2021091617 Sep 2021 03:15 minor feature: Implemented sw build (cppan is deprecated). Improved cmake build. Code cleanup and optimization. A lot of.
5.0.0-beta-2021081516 Aug 2021 03:15 minor feature: Implemented sw build (cppan is deprecated). Improved cmake build. Code cleanup and optimization. A lot of.
5.0.0-alpha-2021040102 Apr 2021 03:15 minor feature: Implemented sw build (cppan is deprecated). Improved cmake build. Code cleanup and optimization. A lot of.
5.0.0-alpha-2020123101 Jan 2021 03:15 minor feature: Implemented sw build (cppan is deprecated). Improved cmake build. Code cleanup and optimization. A lot of.
5.0.0-alpha-2020122425 Dec 2020 03:15 minor feature: Implemented sw build (cppan is deprecated). Improved cmake build. Code cleanup and optimization. A lot of.
4.1.127 Dec 2019 03:15 minor feature: Implemented sw build (cppan is depreciated). Improved cmake build. Code cleanup and optimization. A lot of.
4.1.1-rc212 Nov 2019 06:45 minor feature: Added new renders Alto, LSTMBox, WordStrBox. Added character boxes in hOCR output. Added python training scripts (experimental) as alternative shell scripts. Better support AVX / AVX2 / SSE. Disable OpenMP support by default. for bounding box problem. Implemented support for whitelist/blacklist in LSTM engine. Improved cmake configuration. Code modernization and improvements. A lot of.
4.1.1-rc102 Nov 2019 11:45 minor feature: Added new renders Alto, LSTMBox, WordStrBox. Added character boxes in hOCR output. Added python training scripts (experimental) as alternative shell scripts. Better support AVX / AVX2 / SSE. Disable OpenMP support by default. for bounding box problem. Implemented support for whitelist/blacklist in LSTM engine. Improved cmake configuration. Code modernization and improvements. A lot of.
4.1.008 Jul 2019 06:45 minor feature: Added new renders Alto, LSTMBox, WordStrBox. Added character boxes in hOCR output. Added python training scripts (experimental) as alternative shell scripts. Better support AVX / AVX2 / SSE. Disable OpenMP support by default. for bounding box problem. Implemented support for whitelist/blacklist in LSTM engine. Improved cmake configuration. Code modernization and improvements. A lot of.
4.1.0-rc422 Jun 2019 23:45 minor feature: 4.1.0-rc3 = VERSION . Replace TessBaseAPI::CatchSignals by a dummy function. . unittest: Add missing Leptonica library for textlineprojection_test. . ocrfeatures: Remove locally used functions from global interface. . Remove old and misguiding build steps on windows. . Remove code for embedded build. . 4.1.0 Release candidate 4.
4.1.0-rc318 Jun 2019 03:25 minor feature: Unittest: and enable params_model_test . Unittest: Add missing unittests to Makefile.am as comments. . Replace sscanf by std::stringstream. . Remove strtofloat. . Baseapi: Use std::stringstream to format float values. . Pdfrenderer: Replace snprintf by std::stringstream. . Pgedit: Remove unused global functions. . Remove unneeded include statements for pgedit.h. . Extend ignore list. . Don't include windows.h from platform.h. . Missing EOL. . Finding tiffio.h cmake clang on windows. . Cmake: add detection of AVX, AVX2, SSE41. . Cmake: show configuration summary. . MSVS support inttypes.h from VS 2015. . Remove unused includes. . The coordinates for EOL tab. . Typo in description. . . . Windows build. . Remove host.h from Tesseract API. . Svutil: Clean include file. . Clean macros in platform.h. . Only include windows.h using host.h. . Svutil.cpp: windows build. . Cmake: remove host.h from installation, remove definition of NOMINMAX . . Build for Windows. . Remove unused variable. . Spelling. . Cmake: Android cross-build. . Print info when uzn file is used. . : intraword spacing for slightly better pdf copy-paste perfo . . Cmake: linux build. . Documentation about datapath: ending "/" is not relevant. . Crash in case of missing PNG support in Leptonica see #2333. . Some typos (most found and by codespell). . Clusttool: Remove unused code and some global functions. . Clusttool: Replace strtof by std::stringstream. . Paramsd: Replace strtod by std::stringstream. . Commandlineflags: Replace strtod by std::stringstream. . Universalambigs: Add missing include file. . Correct tessdata comment in baseapi.h. . Tesscallback: Remove unused code. . Tesscallback: Remove more unused code. . Remove unused include. . ScrollView: remove custom implementation of GetAddrInfo. . CPPFLAGS configuration for icu4c and libarchive missing from conf . . Autotools: remove list of traineddata files. . Cmake: buil
4.1.0-rc202 May 2019 03:15 minor feature: add removed function to API compatibility . Removed lstm_choice_mode for backwards compatibility in 4.1. . Add some of the lstm_choice_mode functionality to restore compatibili . . . . C-API compatibility with 4.0.0 version. . ETEXT_DESC: backwards compatibility with 4.0.0 API. . Revert "C-API compatibility with 4.0.0 version".
5.0.0-alpha21 Apr 2019 03:15 minor feature: Refactor class Network . Allow UTF-8 variant of C locale. . . . . . Change option -l to --lang. . . . . . correct handling of 0BF0-0BFA Tamil numbers and symbols. . use space instead of tab. . . . install lstmbox and wordstrbox config files. . . . Add lstmbox and wordstrbox to C-API. . Add lstmboxand wordstrbox to capi.h. . . . Merge branch 'master' into mya. . . . rename LSTMBOX to LSTMBox. . Add TSV option to C-API. . Validator: compiler warnings (signed/unsigned). . BoxChar: compiler warnings (signed/unsigned). . ICOORD: old type casts. . commandlineflags: compiler warnings (signed/unsigned). . . PAGE_RES_IT: Optimize compare operators by using inline code. . . . . . Rename function to TessBaseAPIGetTsvText to be consistent to the Crea . . Format new code with clang-format. . Format new code with clang-format. . Added an additional optional --tmp_dir parameter to specify the tempo . . Added the same --tmp_dir flag to tesstrain_utils.sh. . Add initial support for traineddata files in standard archive formats. . . . Document that configfile can be a file path. . . . . . trying to add tessedit_char_whitelist etc. again: . LSTM char_whitelist/blacklist (6ac2ff0): also sublangs. . LSTM char_whitelist/blacklist (6ac2ff0): multi-code chars. . unittest: Add another file from Abseil. . unittest: Add missing libarchive. . unittest: Remove tmp directory from repository and create it during b . . LSTM char_whitelist/blacklist (6ac2ff0): more robust. . Heap-buffer-overflow in GenericVector::size. . . . assertion caused by wrong unicharset. . Index-out-of-bounds in IntegerMatcher::UpdateTablesForFeature. . . . cmake: enable libArchive support for non_cppan build. . Report libArchive support. . . . Add libarchive dependency to cppan and sw builds. .
4.1.0-rc118 Feb 2019 06:05 minor feature: Amharic font list . . . . . . . . . IntSimdMatrixSSE: comment. . Use -ffast-math for calculation of dot product. . . . potential crash in STRING class. . potential crash in tprintf. . Remove unneeded test for nullptr. . Add missing static attribute to local inline functions. . IntSimdMatrixSSE: Remove unused include statement and simplify code. . SIMDDetect: Use tesseract namespace and format code. . . . . . Add config variable for selection of dot product function. . . . GENERIC_2D_ARRAY: runtime error in assignment operator. . wrong font attributes in hOCR output. . . Add check whether compiler supports -march=native flag. . wrong x_fsize in hOCR output (regression). . . . SimpleStats: Remove unused method. . FPRow: Remove three unused methods. . . . several typos (most of them found by codespell). . . . Use std::stringstream to generate ALTO output and add element. . protos: Remove several unused macros, functions and global variables. . . . protos: Remove unused config variable. . . . remove setting constant resolution from ImageThresholder::SetImage. . Include ALTO in list of supported output formats. . Move code for hOCR renderer to new file. . Format code in new file hocrrenderer.cpp. . indentation of hOCR output. . Add new hocrrenderer.cpp to CMakeList.txt and Android.mk. . value for PHYSICAL_IMG_NR in ALTO output. . . . . . Use std::stringstream to generate hOCR output. . . . Switch windows builds to SW. . Revert "Switch windows builds to SW.". . add missing the implementation for TessBaseAPIGetAltoText method in C . . Don't try to create text output if other renderers failed (regres . . . provide info about compiled openmp version. . Remove altorenderer.cpp from resource compiling (already included in . . Merge branch 'master' of https://github.com/tesseract-ocr/tesseract. . add support for clang on windows (cmake ninja). . compiler warning. . clang warnings. . and simplify SIMD tests. . Update cma
4.0.030 Oct 2018 21:45 minor feature: Add deconfiguration for LSTM . . . add lstmdeconfig to distribution and installation process. . 4.0.0 Release.
4.0.0-rc425 Oct 2018 07:05 minor feature: cppan build. . commontraining: two comments. . . . CID 1396172 (Uninitialized members). . Revert "CID 1396172 (Uninitialized members)". . TessPDFRenderer: Remove unused member variable jpg_quality_ (CID 1396 . . CycleTimer: Add missing initialization (CID 1396168). . lstmtraining: Handle failed remove syscall (CID 1396166). . Classify: new resource leak (CID 1396163). . uninitialized scalar variable (CID 1395880). . OpenclDevice: Catch negative index (CID 1395110). . SVNetwork: Handle failed socket call (CID 1164597). . classify/cluster: Replace Emalloc by std::vector. . sum computation in higher precision. . LLSQ: Replace sqrt by std::sqrt. . sum computation in higher precision. . . . . . Renamed GetGlyphConfidences() to GetChoices() and glyph_confidences t . . . . LineHypothesis: Add copy assignment operator. . ParamsTrainingHypothesis: Add copy assignment operator. . BLOB_CHOICE: Add copy assignment operator. . ROW: Add declaration for copy constructor. . C_OUTLINE_FRAG: Add declaration for copy constructor. . BlamerBundle: Add declaration for copy assignment operator. . unittest: Add more files from Google. . . . . . TessResultRenderer: Extend API to access status of renderer. . tesseractmain: Show error message when output file could not be created. . . . Update test submodule. . Add configuration for LGTM. . . . . . free PangoFontMap;. cluster: some potential overflows. . BLOBNBOX: Declare signed bit field. . . . configuration for LGTM. . Rename API function for getting LSTM choices. . Rename API function from GetBestLSTMChoices to GetBestLSTMSymbolChoices. . . . training: Don't hide global variables. . . . ; define NOUNDEFINED for cygwin. . Merge branch 'master' of https://github.com/tesseract-ocr/tesseract. . Revert "free PangoFontMap; ". . Revert "prefer to use FreeType for pango_cairo_font_map". . Remove type cast and compiler warning (-Wcast-qual). . ScrollView: Optimize local table_colors. . install tra
4.0.0-rc315 Oct 2018 21:05 minor feature: using c-api / compile with gcc . Merge branch 'master' of https://github.com/tesseract-ocr/tesseract. . Merge branch 'master' of https://github.com/tesseract-ocr/tesseract. . Plumbing: Remove comparison which is always false. . pgedit: remove unused declaration of display_bln_lines. . svpaint: Change a variable from global to local. . UNICHARMAP: Remove comparison which is always false. . Classify: Don't hide deparameter. . SVPaint: Remove empty block. . . Avoid crash with --psm 0 and LSTM traineddata. . Always use isascii() with isspace(). . . . . Update test submodule. . Update googletest submodule to release v1.8.1. . . . keep API compatibility with #1265. . Remove code for _MSC_VER 1900. . . . Remove virtual specifiers. . Merge branch 'master' of https://github.com/tesseract-ocr/tesseract. . . . uninitialized variable, remove unused variable. . hocr: add ocrp_wconf to unconditional ocr-capabilities;. integer overflow in overlap calculation. . . Use env variable in AppVeyor configuration. . . . remove insight.io badge. . remove not existing directory from autotools distribution. . Merge branch 'master' of https://github.com/tesseract-ocr/tesseract. . building of ScrollView.jar with modern java version;. Add Abseil as a submodule (needed for some of the new unit tests). . Update test submodule. . Add more hacks for use with Google unittests. . Enhance LOG emulation. . Add a basic implementation of class CycleTimer. . unittest: Add baseapi_test. . unittest: Add qrsequence_test. . unittest: Add fileio_test. . . . Remove gradechop.h. . Remove tab character in source files. . . . unittest: Add imagedata_test. . unittest: Add paragraphs_test. . unittest: Add lang_model_test (only works partially). . unittest: Add mastertrainer_test (only works partially). . . . unittest: Disable build rules for tests which still fail to build. . adapt info about ScrollView.jar build. . add cmake files to autotools distribution packa
4.0.0-rc208 Oct 2018 13:45 minor feature: remove duplicate help from combine_lang_model . Move content of ipoints.h to points.h and remove ipoints.h. . comments. . CID 1395882 (Uninitialized scalar variable). . print help for tesstrain.sh;. CID 1164579 (Explicit null dereferenced). . . . version info in VERSION. . Update tesseract man page about both OCR engines in tesseract 4. . Update README about both OCR engines in tesseract 4. . . . Don't set page segmentation mode for hocr, pdf and tsv configs. . . . Allow orientation detection with any traineddata. . . . Don't set page segmentation mode for unlv config. . . . Update tesseract man page. . Add Makefile rule to build HTML manpages. . . . Document some more config options for tesseract. . . . Merge and enhance documentation on language and script models. . . . implement parameter min_characters_to_try for minimum characters to t . . lstmtraining: Check write permission for output model. . combine_tessdata: Handle failures when extracting. . . . lstmtraining: Remove dead code for purified model name. . use of wrong UNICHARSET. . . . . constructor for class Dict (uninitialized member variables). . genericvector: Rewrite code to satisfy static code analyzer. . intproto: Use more efficient float calculations for floor. . . . . . rect: Use more efficient float calculations for ceil, floor. . chop: Use more efficient float calculations for sqrt. . genericvector: Pass parameters by reference. . GENERIC_2D_ARRAY: Pass parameters by reference. . WERD_RES: Remove comparisons which are constant. . improve description of min_characters_to_try variable. . pgedit: Change some variables from global to local ones. . . . "mktemp -d --tmpdir" on Mac OS; see #1453. . Merge branch 'master' of https://github.com/tesseract-ocr/tesseract. . Rework check for readable input file. . . . use pdf L_FLATE_ENCODE only for png input;. Release candidate 2.
4.0.0-rc101 Oct 2018 20:25 minor feature: Added JPEG quality option parameter (-c jpg_quality=n) . reported by Coverity Scan. . . reported by Coverity Scan. . . TessPDFRenderer: Improve robustness of API. . . detected by Coverity Scan. . . whitespace. . . detected by Coverity Scan. . . detected by Coverity Scan. . potential crash with --psm 0 and use osd.traineddata automatically. . . . . ImageThresholder::OtsuThresholdRectToPix for OpenCL. . . . ColPartition: Rename median_size_ - median_height_. . . . Initial COmmit to add Aksara Jawa - Javanese script. . typo re Javanese. . chamge validate javanese similar to indic. . . . Revert Makefile.am to beta.2. . . . remove duplicate include. scrollview: Clean include statements. . . . typo in function name. . typo in comments and variable name. . . . typo in function name. . typo in comments and variable name. . Javanese script training. . remove duplicate include. . . . add variable --save_box_tiff to Save box/tiff pairs along with lstmf . . Added the option for character accumulated glyph confidences. . . . . CID 1395116 ('Constant' variable guards dead code). . CID 1395114 ('Constant' variable guards dead code). . CID 1395113 ('Constant' variable guards dead code). . CID 1395109 (Logically dead code). . CID 1395108 (Dereference after null check). . CID 1164567 (Dereference after null check). . . . assertion caused by access to default TBOX. . . new whitespace. Convert CRLF line endings to LF. . . Move class tesseract::File from training to ccutil. . Add more portability hacks for Google test environment. . Add more unittests from Google. . unittest: and enable bitvector_test. . unittest: and enable cleanapi_test. . unittest: and enable colpartition_test. . unittest: and enable denorm_test. . Add ARRAYSIZE macro for Google test environment. . unittest: and enable heap_test. . unittest: and enable indexmapbidi_test. . unittest: and enable intfeaturemap_test. . . . unittest: and enable linlsq_tes
4.0.0-beta.431 Jul 2018 15:25 minor feature: CID 1393540 (Explicit null dereferenced) . CID 1393244 and CID 1393244 (Uninitialized scalar variable). . CID 1393243 (Uninitialized scalar field). . . . CID 1393239 (Dereference null return value). . CID 1393238 (Dereference null return value). . CID 1393241 (Dereference null return value). . . . Replace ASSERT_HOST in genericvector.h. . Remove errcode.h from public API. . . . Remove public API file ndminx.h. . . . Clean usage of assert.h. . . . Replace string.h by standard C++ cstring. . . . Remove LSTM header files from public API. . Remove arch header files from public API. . . . Remove unneeded include statements for scanutils.h. . . . Remove recursive header. . Clean some include statements. . Remove memry.h from public API. . . . Remove empty tessbox.h. . Clean more include files and include statements. . . . coutln: Replace alloc_mem, free_mem by standard functions. . adaptions: Remove unneeded include statement. . qspline: Remove unneeded include statement. . strngs: Replace alloc_mem, free_mem by standard functions. . gap_map: Replace alloc_mem, free_mem by C++ new, delete. . pitsync1: Remove unneeded include statement. . qspline: Replace alloc_mem, free_mem by C++ new, delete. . makerow: Replace alloc_mem, free_mem by C++ new, delete, std::vector. . oldbasel: Replace alloc_mem, free_mem by C++ new, delete, std::vector. . pithsync: Replace alloc_mem, free_mem by C++ std::vector. . tordmain: Replace alloc_mem, free_mem by C++ std::vector. . Remove memry.cpp, memry.h. . Remove stderr.h and its include statements. . . . dotproductsse: include statements. . . . Update VERSION. . . . CID 1386094 (Unread field). . CID 1386098 (Dubious method used). . CID 1386104 (Dereference null return value). . CID 1386083 (Dereference null return value). . . . CID 1164746 (Big parameter passed by value). . CID 1157757 (Logically dead code). . CID 1158180 (Argument cannot be negative) and clean code a bit. . CID 1242849 (U
4.0.0-beta.324 Jun 2018 14:25 minor feature: Remove more header files from public API.
4.0.0-beta.220 Jun 2018 03:17 minor feature: Download the leptonica source from github . . . Add new line to a few error messages. . . . filenames in comments. . . . from pull of cleanups: clang tidied, reviewed, new, . . Added script-specific validation and normalization for virama-using s . . build broken by previous commits that added use of string in lo . . Deleted some dead LSTM code, making everything use the recoder. . Removed changes from last commit that didn't belong. . Move LSTM unicharset and recoder to traineddata with version string p . . type of bit values. . wrong data type in argument for sscanf. . Remove extra semicolons. . windows build. . . . regression of. PangoFontInfo: Remove unused method is_fraktur. . PangoFontInfo: Remove unused method is_monospace. . PangoFontInfo: Remove unused method is_smallcaps. . PangoFontInfo: Remove unused method is_bold. . PangoFontInfo: Remove unused method is_italic. . Use lept_free to free memory allocated by Leptonica. . regression of again!. . . . . . BestPix to always return the highest resolution available, even . . Removed unnecessary using statements and cleaned up google/non-google . . Important to RTL languages saves last space on each line, which w . . clang tidy on previous pull. . Add googletest submodule. . cmake: Add googletest. . googletest: Add dummy test. . Changed the way unicharsets are handled to allow support for the ch . . Rewrote the recoder to use an encoding based on wubi instead of radic . . Define std::max under VS2017 x64. . . . . . Part 2 of separating out the unicharset from the LSTM model, ing c . . Added ADAM optimizer, unless git screwed it up, cos there is no diff. . Removed errors introduced by git merge. . Added AVX2 and AVX512 detector. . Added convert to int and directory listing to combine_tessdata. .
4.0.0-beta.111 Mar 2018 19:05 minor feature: Remove unused method TessdataManager::OverwriteEntry . Remove unused method TessdataManager::LoadFileLater. . crash if output file could not be opened. . : cleanup. . : inside main() use return rather than exit. . . . . . Improve robustness of TessdataManager. . . automake: Enable all warnings and a warning. . . . genericvector: Add overloaded LoadDataFromFile. . Remove unneeded null pointer check. . . . Replace Standard C library header files by C++ header files. . Remove obsolete comments and unused code from ccutil/host.h. . . . EquationDetect: Remove unneeded new / delete operations. . . . and improve Dockerfile. . . . opencl: Remove more unused code. . . . README: Add Coverity badge. . . . Update README.md. . Reduce number of new / delete operations for class KDTreeSearch. . Reduce number of new / delete operations for class LanguageModel. . . . UNICHARSET: Add missing initialization. . . Optimize LSTM code for builds without OpenMP. . . . use correct name for Mac OS X, correct link to training wiki;. Update documentation for installation. . . . Reorganize Readme.md. . Update Template. . Add link to ` the guidelines for this repository`. . Add link to guidelines for this repository. . Add badges for Doxygen and Wiki documentation. . typo. . Update readme for 3.05.01. . StringRenderer::pen_color_: int 3 - double 3 . . Change Mac OS X - macOS. . PangoFontInfo: Remove unused method is_fraktur. . Remove strcasestr which is no longer needed. . . . . . . . . . . . PangoFontInfo: Remove unused method is_monospace. . PangoFontInfo: Remove unused method is_smallcaps. . PangoFontInfo: Remove unused method is_bold. . PangoFontInfo: Remove unused method is_italic. . Make less verbose. . . . . . opencl: Remove unused code. . opencl: some compiler warnings. . . . LSTMTrainer: Catch empty vectors. . Update from Leptonica 1.74.1 to 1.74.2. . Travis CI for Leptonica 1.74.2. . . . Remove local implementation of
3.05.0102 Jun 2017 06:39 major bugfix: Bugfix release for stable tesseract version
3.05.0017 Feb 2017 11:05 minor feature: Made some fine tuning to the hOCR output. Added TSV as another optional output format. ABI break introduced in 3.04.00 with the AnalyseLayout() method. text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer. Training tools - Replaced asserts with tprintf() and exit(1). Cygwin compatibility. Improved multipage tiff processing. Improved the embedded pdf font (pdf.ttf). Enable selection of OCR engine mode from command line. Changed tesseract command line parameter '-psm' to '--psm'. Added new C API for orientation and script detection, removed the old one. Increased minimum autoconf version to 2.59. Removed dead code. many compiler warning. memory and resource leaks. some with the 'Cube' OCR engine. some openCL. Added option to build Tesseract with CMake build system. Implemented CPPAN support for easy Windows building. . Added TSV as another optional output format. ABI break introduced in 3.04.00 with the AnalyseLayout() method. text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer. Training tools - Replaced asserts with tprintf() and exit(1). Cygwin compatibility. Improved multipage tiff processing. Improved the embedded pdf font (pdf.ttf). Enable selection of OCR engine mode from command line. Changed tesseract command line parameter '-psm' to '--psm'. Added new C API for orientation and script detection, removed the old one. Increased minimum autoconf version to 2.59. Removed dead code. many compiler warning. memory and resource leaks. some with the 'Cube' OCR engine. some openCL. Added option to build Tesseract with CMake build system. Implemented CPPAN support for easy Windows building.
4.00.00alpha16 Dec 2016 09:05 minor feature: Remove unneeded definition for NULL. Use different font list and exposures for "lat" language training. Add info for progress monitor, make it visible in doxygen doc; remove?. Add Junicode to neo-Latin fonts. Update ci scripts. Test release build on windows. Update appveyor.yml. Update appveyor.yml. Update appveyor.yml. Training should work now. Update.travis.yml. Update appveyor.yml. Update CMakeLists.txt. Update.travis.yml. Merge branch 'master' of github.com:tesseract-ocr/tesseract. Update CMakeLists.txt. Update leptonica version. Update.travis.yml. Update appveyor.yml. Merge branch 'master' of github.com-egorpugin:egorpugin/tesseract. Update CMakeLists.txt. Improve leptonica search. Make box training work. Compatibility with Leptonica 1.73. Add more include directories. Merge branch 'master' of github.com:tesseract-ocr/tesseract. Update README.md. Update README.md. Update README.md. Replace pdf.ttf with sharp2.ttf, keep name the same. Document hocr_font_info in config. INCOMPATIBLE to hOCR line height information -. varsize array for Microsoft compiler. Only generate dir for HOCR when needed -. Emit fewer "lang" attributes. Add LTR mixed direction test files. Update README.md. compiler warning (signed / unsigned mismatch). Adds char GetHOCRTSVText(int) as placeholder. Copy of char GetHOCRT?. Adds TessHOcrTsvRenderer class for rendering HOCR info in tsv format. Calls TessHOcrTsvRenderer if tessedit_create_hocrtsv is true. Adds hocrtsv file to configs folder. Adds hocrtsv to tessdata/configs/Makefile.am. Adds BoolParam tessedit_create_hocrtsv in class Tesseract. Render output in TSV format. Avoids HTML escaping. Cleanup TSV renderer. hocrtsv references in Makefile. Add inactivity timeout for icu download on windows. move new delete histogramAllChannels inside the #ifdef USE_OPENCL; fi?. Update INSTALL.GIT.md. improve tesseract.pc.in -. solve segfault for box.train;. update Release Notes. Don't display tesseract's banner when quiet
3.04.0117 Feb 2016 10:45 minor feature: Add check for opencl requirements. Rework opencl requirements (configure: error: conditional "AMDEP"?. Typo. GRAPHICS_DISABLED build. Strcasestr needed on Cygwin too. Libicui18n is only called libicuin on mingw, not cygwin. Implement build without cube (-DNO_CUBE_BUILD). Tessedit_create_txt 0 blocks box training. Memmory leak based on (https://code.google.com/p/tesse?. Remove empty header file secname.h. Replace CubeUtils::UTF8ToUTF32 in pdfrenderer. Enable pdfrender with NO_CUBE_BUILD. NO_CUBE_BUILD with reverting to ANDROID_BUILD in baseapi. Improve NO_CUBE_BUILD. in UTF-16BE conversion. Remove extraneous line feed. VC14 compiler. Enable OpenMP support. Turn off optimisation in Microsoft Visual Studio for TextlineProjecti?. Rename README to README.md -. Remove info about VS 2008. to compile tesseract on mac with clang. For OpenCL reported on Apple Mac. Still get -54 on Apple?. VS2010 build. OpenCL build on Mac. Configure.ac for OS X and -framework. Missing "allheaders.h" when compiling with --enable-opencl on OS X. Various clang compilation errors. Get OpenCL to compile on OS X. Configure.ac unconditionally enabling OpenCL. Add ULL to constants which overflow 32 bits. Simplify build and run of ScrollView. Tesstrain.sh: Only fall back to default Latin fonts if none were prov?. Tesstrain.sh: Only set FONTS if they weren't set on the command line. Tesstrain.sh: Initialise fontconfig even if Arial isn't available. Remove --bin_dir option from tesstrain.sh (should use PATH instead). Add --exposures option to tesstrain.sh. Use mktemp to create workspace directory. COPYING: typo found by codespell. Api: typos in comments (all found by codespell). Ccmain: typos in comments and strings. Typo. Ccstruct: typos in comments and strings. Ccutil: typos in comments and strings. Classify: typos in comments and strings. Cube: typos in comments. Cutil: typos in comments. Dict: typos in comments and strings. Doxyfile: typo in comment. Java: typos in comments and strings. Wordrec: ty
3.04.01dev25 Aug 2015 03:45 minor feature: Added OpenCL support (experimental). Many.
3.04.0020 Aug 2015 08:26 minor feature: Tesseract development is now done with git and hosted at github.com (Previously we used Subversion as a vcs and code.google.com for hosting). Tesseract now requires leptonica 1.71 or a higher version. Removed official support for VS 2008. Added support for many more scripts/languages. Major updates to training system as a result of extensive testing on 100 languages. Improved performance with PIC compilation option. Significant change to invisible font system in pdf output to improve correctness and compatibility with external programs, particularly ghostscript. Improved font identification. Major change to improve layout analysis for heavily diacritic languages: Thai, Vietnamese, Kannada, Telugu etc. Fixed problems with shifted baselines so recognition can recover from layout analysis errors. Major refactor to improve speed on difficult images, especially when running a heap checker. Moved params from global in page layout to tesseractclass. Improved single column layout analysis. Allow ocr output to multiple formats using tesseract command line executable. Fixed issues with mixed eng+ara scripts. Improved script consistency in numbers. Major refactor of control.cpp to enable line recognition. Added tesstrain.sh - a master training script. Added ability to text2image training tool to just list available fonts. Added ability to text2image to underline words. Improved efficiency of image processing for PDF output. Added parameter description for each paramater listed with 'print-parameters' command line option. Added font info to hocr output. Enabled streaming input and output of multi-page documents. Many bug fixes.