|Tags||vbnet mono win32 docx word error-correction xml data-recovery|
3.0.007 Apr 2015 15:05 major feature: Fixed bug in utf8 encoding for RT_TextBytesAtom record in PPT parser. Initial Open Document Flat XML Parser (fodt, fods, fodg, fodp). Initial EML Parser (with ability to extract attachments). Capabilities of C API has been expanded. Better charset detection in HTML parser. New TXT parser (Can be used to change encoding to UTF8). ListStyle is a class now (not an enum). Initial PDF parser. Whole public interface of DocToText (PlainTextExctractor, Metadata, Link, Exception, FormattingStyle) is available under doctotext namespace. DocToText supports exceptions now (Exception class). Reorganization of url handling. PlainTextExtractor can now return list of links. Supported parsers: HTML/EML/ODF_OOXML/ODFXML. PlainTextExtractor allows now for parsing files from memory buffer. Independence from glib, libgsf, gettext. Pthreads are used instead of gthreads, build-in OLEReader is used instead of libgsf. New iWork parser. New XLSB parser. Extracting number of pages from ODG files fixed. ODG files added to automatic tests. Support for Object Linking and Embedding (OLE) in ODF formats added. Managing libxml2 (initialization and cleanup) can be disabled. Thread-safety fixed in ODF, OOXML and DOC parsers. Better handling of fields in DOC and DOT files. Better handling of headings in ODF documents. Fixes for x86_64 architecture. Improved stability in multithreaded environments. Embedded XLS Workbooks in DOC files supported without creating temporary files. Cleanups in ODF and OOXML parsers. Memory consumption of ODF and OOXML parsers significantly reduced. Better handling of fields in DOCX files. Fixed crash in RTF parser for invalid files. Fixes in XLS parser. Initial port to win64 architecture. Function enter and exit tracking feature useful for debugging.
ManageYou can also help out here by:
← Update project
or flagging this entry for moderator attention.