Xidel 0.9.9

Xidel is a command line tool to download web pages or JSON-APIs and extract data from them. It can download files over HTTP/S connections, follow redirections, links, (partially) filled-in forms, extracted values, and process local files. The data can be extracted using XPath 2.0, XQuery 1.0, XPath/XQuery 3.0 and JSONiq expressions, CSS 3 selectors, and custom, pattern-matching templates that are like an annotated version of the processed page. The extracted values can then be exported as plain text/XML/HTML/JSON, or assigned to variables to be used in other extract expressions or be exported to the shell. There is also an online CGI service for testing.

Tags www interpreter shells xml html xquery xpath json console developers
License GNU GPLv3
State development

Recent Releases

0.9.912 Jul 2023 03:15 minor feature: (under development). Support 99.6 of XPath/XQuery 3.1, for example ? operator, = operator, map:/array: functions, string constructor. Support some XPath 4.0 syntax, that is: extended ? operator, - operator, otherwise operator, for member, functions: some, all, identity, characters, replicate, map:filter, index-where, is-NaN. --json-mode option to switch between XPath 3.1 or JSONiq syntax for JSON processing, or (default) a mix of both. New extension functions: x:request-decode to parse the parameters of a HTTP request. Inner-text to get the visible text from a HTML page (no full CSS support). Matched-text for use in pattern matching (this replaces the text() function. stop using text() in patterns. text() is now deprecated and will be replaced by the standard./text() kind test). New behaviour for form() when building an HTTP request to submit a HTML element: The element has become optional and defaults to the first element on the current webpage. Additional values can be given as sequence, and each item of the sequence is sent separately, for example, "key":(1,2) sends key=1 key=2 and "key":() sends nothing for that key. Handles rarer elements like image buttons or _charset_ input elements. x:replace-nodes to replace nodes (replacing the deprecated pxp:transform function). --in-place option to override the input file with the output. element in multipage templates to download JSON data. Default whitespace normalization is disabled by default. Set implicit timezone from local time. Options for HTTPS certificates and enable validation by default. Improved parsing and serialization: faster, options for x:serialize-json; do not escape #nbsp; for XML/HTML. Improved error messages. Reduced memory usage. Compiled with newer FreePascal.
0.9.803 Apr 2018 00:18 major feature: The 0.9.8 version - improves cookie handling to follow RFC 6265 rather than sending all cookies to all servers. - adds t:siblings-header/siblings elements to pattern matcher to match certain element siblings regardless of their ordering (e.g. table columns). - adds functions x:call-action, x:has-action, x:get-log, x:clear-log to give programmatic access to multipage templates and variable changelog. - adds --module, --module-path parameters to load XQuery modules into (xpath) queries and properly resolve relative paths for module imports. - fixes system(), file:exists, file:move (override), file:path-to-uri on Windows - has further minor bug fixes and performance improvements
0.9.620 Nov 2016 13:56 major feature: The 0.9.6 version * adds function x:request for HTTP or follow-like requests inside a query * new functions: x:argc, x:argv, x:integer, x:integer-to-base * fixes that entities were not decoded, if --output-encoding was not set * improves default encoding settings when converting between Windows terminal encoding and utf8 for piped files * a new JSON parser with two distinct modes: input formats json/json-strict for accepting/rejecting invalid JSON * JSON output is prettified. * An xquery version declaration disables all extension unless a version code like "3.0-xidel" or "3.0-jsoniq" is used. #!xidel in the first line is ignored, so it can be used for executable XQuery scripts. * has various fixes, performance improvements and internal restructuring
0.9.409 Jun 2016 14:06 major feature: The 0.9.4 release completes support for XQuery 3.0 and XPath 3.0, e.g. functions format-*, math:* functions, uses a new underlying regular expression engine, FLRE, which supports Unicode, p, c, i character classes and is, supposed to be extremely fast stricter error conditions, invalid XPath queries that were previously evaluated are now rejected, for example, in regular expressions (?i) or b, (1 + if(..) then ...) instead of (1 + (if(..) then ...)), gYear("123") instead, of gYear("0123") this improves the XQuery Test Suite conformance and it passes basically 100 of the XPath 2 tests and over 99.5 , of the XQuery 3 tests supports the EXPath file module to write or read local files, custom functions: random( max), random-seed(), x:product( seq), new multipage template statements: , , the error raising HTTP codes are now customizable, the headers of the last HTTP request can be accessed via headers, improved debugging facilities: Evaluation tracing, colored output, improved error messages, the default data model for primitive types is now XML Schema 1.1 and Unicode 8.0 conformant, boolean operations are short-cut evaluated, --quiet/-q was renamed to --silent/-s, check for XIDEL_OPTIONS environment variable, various fixes and performance improvements
0.903 Jul 2015 16:17 major feature: The 0.9 release adds support for most of the XPath/XQuery 3.0 syntax like anonymous and higher order functions, supports multipart HTTP requests for file uploads, changes the default output format, adds an (experimental) function for page modifications, fixes a large number of bugs mostly related to command line parsing and XPath/XQuery standard compatibility, and more...