|Tags||shell csv processing sorting filtering command-line hashed-data|
5.7.018 Mar 2020 03:17 minor feature: Miller is available via MacPorts thanks to @herbygillot. Miller tracking is #273. . An Alpine Linux port is pending this release thanks to @terorie. Miller tracking is #293. . The new remove-empty-columns and skip-trivial-records are keystroke-savers for things which would other require DSL syntax, as tracked in #274. A regarding optional regex-pattern groups was in #277. As of #294 you can now specify --implicit-csv-header for the join-file in mlr join. A with spaces in XTAB-file values was on #296. A with missing final newline for XTAB-formatted files using MMAP files was on #301. Look-and-feel at http://johnkerl.org/miller/doc/ is (hopefully) improved, including clearer visual indication of which section/page you're currently looking at. Note that this change has been live for a few weeks, as look-and-feel-related doc-mods from post-5.6.2 were backported to http://johnkerl.org/miller/doc/. . improves DSL-function documentation at http://johnkerl.org/miller/doc/reference-dsl.html#Built-in_functions_for_filter_and_put,_summary. . For an incremental performance gain (perhaps 10-20 run time at most, but see below), within the C source code one can use the mmap system call to access input files via pointer arithmetic rather than malloc-and-memcopy using stdio. However mmap is not available when reading from standard input -- it cannot be memory-mapped. This means all file-format readers are implemented twice within the Miller source code. While I try to regression-test Miller thoroughly, running all canned tests through mmap and stdio mode, I've nonetheless found my mmap implementations liable to corner-cases which I miss but users find: for example #29, #102, and #296. As tracked on #160, various operating systems do not release mmapped pages after use as one might intuit, meaning that for large files and/or large numbers of files, I've for a long time now needed to have Miller opt out of mmap usage for precisely those cases which most need the performance gai
5.6.223 Sep 2019 01:45 minor feature: a corner-case with more than 100 CSV/TSV files with headers of varying lengths. The new http://johnkerl.org/miller/doc/whyc-details.html is an elaboration on http://johnkerl.org/miller/doc/whyc.html which answers a question posed by @BurntSushi on Reddit a couple years ago which I did not address in detail at the time.
5.6.118 Sep 2019 06:05 minor feature: 5.6.0 release docs . 5.6.0-dev. . "interger" spelling. . Whatis info in manpage. . . . . . Generate manpage.html artifact from PR 269. . - in doc. . Merge. . Mobile-friendly docs. . Merge. . Mobile-friendly docs. . 5.6.1.
5.6.013 Sep 2019 09:05 minor feature: The new system DSL function allows you to run arbitrary shell commands and store them in field values. Some example usages are documented here. This is in response to and #209. . There is now support for ASV and USV file formats. This is in response to. . The new format-values verb allows you to apply numerical formatting across all record values. This is in response to. . The new DKVP I/O in Python sample code now works for Python 2 as well as Python 3. . There is a new cookbook entry on doing multiple joins. This is in response to. . The toupper, tolower, and capitalize DSL functions are now UTF-8 aware, thanks to @sheredom's marvelous https://github.com/sheredom/utf8.h. The internationalization page has also been expanded. This is in response to. . a using in-place mode in conjunction with verbs (such as rename or sort) which take field-name lists as arguments. . a in the label when one or more names are common between old and new. . a corner-case when (a) input is CSV; (b) the last field ends with a comma and no newline; (c) input is from standard input and/or --no-mmap is supplied.
5.5.001 Sep 2019 11:45 minor feature: The new positional-indexing feature resolves #236 from @aborruso. You can now get the name of the 3rd field of each record via 3 , and its value by 3 . These are both usable on either the left-hand or right-hand side of assignment statements, so you can more easily do things like renaming fields progrmatically within the DSL. . There is a new capitalize DSL function, complementing the already-existing toupper. This stems from #236. . There is a new skip-trivial-records verb, resolving #197. Similarly, there is a new remove-empty-columns verb, resolving #206. Both are useful for data-cleaning use-cases. . Another pair is #181 and #256. While Miller uses mmap internally (and invisibily) to get approximately a 20 performance boost over not using it, this can cause out-of-memory with reading either large files, or too many small ones. Now, Miller automatically avoids mmap in these cases. You can still use --mmap or --no-mmap if you want manual control of this. . There is a new --ivar option for the nest verb which complements the already-existing --evar. This is from #260 thanks to @jgreely. . There is a new keystroke-saving urandrange DSL function: urandrange(low, high) is the same as low + (high - low) urand(). . There is a new -v option for the cat verb which writes a low-level record-structure dump to standard error. . There is a new -N option for mlr which is a keystroke-saver for --implicit-csv-header --headerless-csv-output. . The new FAQ entry http://johnkerl.org/miller/doc/faq.html#How_to_escape_'?'_in_regexes resolves #203. . The new FAQ entry http://johnkerl.org/miller/doc/faq.html#How_can_I_filter_by_date resolves #208. . a documentation while highlighting the need for #241. . There was a SEGV using nest within then-chains, in response to #220. . Quotes and backslashes weren't being escaped in JSON output with --jvquoteall; reported on #222.
5.4.015 Oct 2018 06:25 minor feature: The new clean-whitespace verb resolves #190 from @aborruso. Along with the new functions strip, lstrip, rstrip, collapse_whitespace, and clean_whitespace, there is now both coarse-grained and fine-grained control over whitespace within field names and/or values. See the linked-to documentation for examples. . The new altkv verb resolves #184 which was originally opened via an email request. This supports mapping value-lists such as a,b,c,d to alternating key-value pairs such as a=b,c=d. . The new fill-down verb resolves #189 by @aborruso. See the linked-to documentation for examples. . The uniq verb now has a uniq -a which resolves #168 from @sjackman. . The new regextract and regextract_or_else functions resolve #183 by @aborruso. . The new ssub function arises from #171 by @dohse, as a simplified way to avoid escaping characters which are special to regular-expression parsers. . There are new localtime functions in response to #170 by @sitaramc. However note that as discussed on #170 these do not undo one another in all circumstances. This is a non-for timezones which do not do DST. Otherwise, please use with disclaimers: localdate, localtime2sec, sec2localdate, sec2localtime, strftime_local, and strptime_local. . Windows build-artifacts are now available in Appveyor at https://ci.appveyor.com/project/johnkerl/miller/build/artifacts, and will be attached to this and future releases. This resolves #167, #148, and #109. . Travis builds at https://travis-ci.org/johnkerl/miller/builds now run on OSX as well as Linux. . An Ubuntu 17 build was by @singalen on #164. . put/filter documentation was confusing as reported by @NikosAlexandris on #169. . The new FAQ entry http://johnkerl.org/miller-releases/miller-head/doc/faq.html#How_to_rectangularize_after_joins_with_unpaired? resolves #193 by @aborruso. . The new cookbook entry http://johnkerl.org/miller/doc/cookbook.html#Options_for_dealing_with_duplicate_rows arises from #168 from @sjackman. . Th
5.3.007 Jan 2018 22:45 minor feature: Comment strings in data files: mlr --skip-comments allows you to filter out input lines starting with #, for all file formats. Likewise, mlr --skip-comments-with X lets you specify the comment-string X. Comments are only supported at start of data line. mlr --pass-comments and mlr --pass-comments-with X allow you to forward comments to program output as they are read. . The count-similar verb lets you compute cluster sizes by cluster labels. . While Miller DSL arithmetic gracefully overflows from 64-integer to double-precision float (see also here), there are now the integer-preserving arithmetic operators.+.-../.// for those times when you want integer overflow. . There is a new bitcount function: for example, echo x=0xf0000206 mlr put ' y=bitcount( x)' produces x=0xf0000206,y=7. . mlr -T is an alias for --nidx --fs tab, and mlr -t is an alias for mlr --tsvlite. . The mathematical constants π and e have been renamed from PI and E to M_PI and M_E, respectively. (It's annoying to get a syntax error when you try to define a variable named E in the DSL, when A through D work just fine.) This is a backward incompatibility, but not enough of us to justify calling this release Miller 6.0.0. . As noted here, while Miller has its own DSL there will always be things better expressible in a general-purpose language. The new page Sharing data with other languages shows how to seamlessly share data back and forth between Miller, Ruby, and Python. SQL-input examples and SQL-output examples contain detailed information the interplay between Miller and SQL. . raised a question about suppressing numeric conversion. This resulted in a new FAQ entry How do I suppress numeric conversion?, as well as the longer-term follow-on which will make numeric conversion happen on a just-in-time basis. . To my surprise, csvlite format options weren t listed in mlr --help or the manpage. This has been. . Documentation for auxiliary commands has been expanded, including with
5.2.221 Jul 2017 14:25 minor feature: Post-5.2.1. ./configure without autoreconf, for travis. Touch aclocal.m4/configure. Touch configure.ac. Travis build while deging autoreconf thingie. Int32/int64 in regtest. More big-endian ing. Neaten. Neaten. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Performance-doc mods. Doc neaten. Neaten. Neaten. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Appveyor iterate. Neaten. Appveyor iterate / nlnet_timegm. Appveyor iterate. Neaten. Appveyor iterate: mlr.exe. Appveyor iterate: mlr.exe. Appveyor iterate: mlr.exe + msys-2.0.dll. Appveyor iterate: attempt static build. Appveyor iterate: do not attempt static build (needs work). Appveyor iterate: -static. Typo. Appveyor iterate: -static. Appveyor iterate. Appveyor doc. Todo. README neaten. 64-bit updates to percentile_keeper. Additional seqgen test case. Initial streaming data-generator pseudo-reader; pending docs. Whitespace. Travis build. Travis build. Travis build. Build. Todo. 5.2.2 release.
5.2.120 Jun 2017 08:45 minor feature: 5.2.0 release-specific docs. Todo. Doc neaten; post-5.2.0. Lemon signed- unsigned hashing for #142. 5.2.1.
5.2.014 Jun 2017 06:25 minor feature: The stats1 verb now lets you use regular expressions to specify which field names to compute statistics on, and/or which to group by. Full details are here. . The min and max DSL functions, and the min/max/percentile aggregators for the stats1 and merge-fields verbs, now support numeric as well as string field values. (For mixed string/numeric fields, numbers compare before strings.) This means in particular that order statistics -- min, max, and non-interpolated percentiles -- as well as mode, antimode, and count are now possible on string-only (or mixed) fields. (Of course, any operations requiring arithmetic on values, such as computing sums, averages, or interpolated percentiles, yield an error on string-valued input.). . There is a new DSL function mapexcept which returns a copy of the argument with specified key(s), if any, unset. The motivating use-case is to split records to multiple filenames depending on particular field value, which is omitted from the output: mlr --from f.dat put 'tee "/tmp/data-". a, mapexcept( *, "a")' Likewise, mapselect returns a copy of the argument with only specified key(s), if any, set. This resolves #137. . A new -u option for count-distinct allows unlashed counts for multiple field names. For example, with -f a,b and without -u, count-distinct computes counts for distinct pairs of a and b field values. With -f a,b and with -u, it computes counts for distinct a field values and counts for distinct b field values separately. . If you build from source, you can now do./configure without first doing autoreconf -fiv. This resolves #131. . The UTF-8 BOM sequence 0xef 0xbb 0xbf is now automatically ignored from the start of CSV files. (The same is already done for JSON files.) This resolves #138. . For put and filter with -S, program literals such as the 6 in x = 6 were being parsed as strings. This is not sensible, since the -S option for put and filter is intended to suppress numeric conversion of record data, not p
5.1.0w19 Apr 2017 08:25 minor feature: Detabify in template.html. 5.1.0 release-specific docs. Post-5.1.0.
5.1.015 Apr 2017 03:16 minor feature: JSON arrays: as described here, Miller being a tabular data processor isn't well-position to handle arbitrary JSON. (See jq for that.) But as of 5.1.0, arrays are converted to maps with integer keys, which are then at least processable using Miller. Details are here. The short of it is that you now have three options for the main mlr executable: The new mlr fraction verb makes possible in a few keystrokes what was only possible before using two-pass DSL logic: here you can turn numerical values down a column into their fractional/percentage contribution to column totals, optionally grouped by other key columns. . The DSL functions strptime and strftime now handle fractional seconds. For parsing, use S format as always; for formatting, there are now 1S through 9S which allow you to configure a specified number of decimal places. The return value from strptime is now floating-point, not integer, which is a minor backward incompatibility not worth labeling this release as 6.0.0. (You can work around this using int(strptime(...)).) The DSL functions gmt2sec and sec2gmt, which are keystroke-savers for strptime and strftime, are similarly modified, as is the sec2gmt verb. This resolves #125. . A few nearly-standalone programs -- which do not have anything to do with record streams -- are packaged within the Miller. (For example, hex-dump, unhex, and show-line-endings commands.) These are described here. . The stats1 and merge-fields verbs now support an antimode aggregator, in addition to the existing mode aggregator. . The join verb now by default does not require sorted input, which is the more common use case. (Memory-parsimonious joins which require sorted input, while no longer the default, are available using -s.) This another minor backward incompatibility not worth making a 6.0.0 over. This resolves #134. . mlr nest has a keystroke-saving --evar option for a common use case, namely, exploding a field by value across records. . The DSL referenc
5.0.113 Mar 2017 03:16 minor feature: 5.0.0 release-specific docs. Post 5.0.0. Todo. Configure doc. Post-5.0.0 feedback. Doc background neaten. Build-doc neatens. Todo. Build-doc refines. Json-skip-arrays-on-input feature iterate. Json-skip-arrays-on-input doc. Todo. Feature-count example @ cookbook. Todo. Doc neaten. Per-function doc as separate h2's. Msys2 configs. Todo. Todo. Remove tools/termcvt cruft (now in lib/aux_entries). Windows-port iterate. Windows-port iterate. Windows-port iterate. Todo. Todo. Windows-port iterate. Windows-port iterate. Windows-port iterate. Windows-port iterate. Windows-port iterate. Windows checkout CRLF vs. LF local checkin. Todo. Windows checkout CRLF vs. LF local checkin. Merge branch 'master' of https://github.com/johnkerl/miller. Merge branch 'master' of https://github.com/johnkerl/miller. Merge branch 'master' of https://github.com/johnkerl/miller. Unb04k windows checkin. Windows line endings, take two. Windows line endings, take two. Todo. Windows line endings, UT. Windows line endings, UT. Windows line endings, UT. Windows line endings, UT. Windows line endings, UT. Windows line endings, UT. Windows line endings, UT. Windows line endings, UT. Merge branch 'master' of github.com:johnkerl/miller. Windows-port iterate. Windows-port iterate. Windows-port iterate. Windows-port iterate. Windows-port iterate. Local_getdelim iterate. Local-getdelim iterate. Aux-entries iterate. Aux-entries iterate. Aux-entries iterate. Aux-entries iterate. Test-line-readers (windows-port iterate). Test-line-readers (windows-port iterate). Aux-entries iterate. Aux-entries iterate. Test-line-readers (windows-port iterate). Todo. Travis build. Neaten. Todo. Windows-port iterate. Windows-port iterate. Todo. Line-reader iterate. Line-reader iterate. Line-reader iterate. Merge. Merge. Line-reader iterate. Line-reader iterate. Line-reader iterate. Line-reader iterate. Line-reader iterate. Strtok - strmsep. Strsep - strmsep. Line-reader iterate. Neaten. Corner-case in multi-ips non-mmap xtab l
5.0.001 Mar 2017 19:05 minor feature: Line endings (CRLF vs. LF, Windows-style vs. Unix-style) are now autodetected. For example, files (including CSV) with LF input will lead to LF output unless you specify otherwise. There is now an in-place mode using mlr -I. You can now define your own functions and subroutines: e.g. func f(x, y) return x2 + y2 . New local variables are completely analogous to out-of-stream variables: sum retains its value for the duration of the expression it's defined in; @sum retains its value across all records in the record stream. Local variables, function parameters, and function return types may be defined untyped or typed as in x = 1 or int x = 1, respectively. There are also expression-inline type-assertions available. Type-checking is up to you: omit it if you want flexibility with heterogeneous data; use it if you want to help catch misspellings in your DSL code or unexpected irregularities in your input data. There are now four kinds of maps. Out-of-stream variables have always been scalars, maps, or multi-level maps: @a=1, @b 1 =2, @c 1 2 =3. The same is now true for local variables, which are new to 5.0.0. Stream records have always been single-level maps; is a map. And as of 5.0.0 there are now map literals, e.g. "a":1, "b":2 , which can be defined using JSON-like syntax (with either string or integer keys) and which can be nested arbitrarily deeply. You can loop over maps -- *, out-of-stream variables, local variables, map-literals, and map-valued function return values -- using for (k, v in...) or the new for (k in...) (discussed next). All flavors of map may also be used in emit and dump statements. User-defined functions and subroutines may take map-valued arguments, and may return map values. Some built-in functions now accept map-valued input: typeof, length, depth, leafcount, haskey. There are built-in functions producing map-valued output: mapsum and mapdiff. There are now string-to-map and map-to-string functions: splitnv, splitkv, splitnvx, splitkv
4.5.022 Aug 2016 19:05 minor feature: Miller 4.4.0 release-specific docs. Post-4.4.0. Todo. Todo. Todo. Cli function splits. Cli function splits. Neaten. Neaten. Cli function splits. Cli function splits. Cli function splits. Cli function splits. Neaten. Cli function splits. Cli function splits. Cli function splits. Todo. Cli function splits. Cli function splits. Cli function splits. Cli function splits. Cli function splits. Cli function splits. Mapper tee now with its own format flags (online help TBD). Mapper put format flags iterate. Mapper put now with its own format flags (online help TBD). UTs for tee/put with different output formats. Global-opts refactor. Global-opts refactor. Mlr_globals factor-away. Neaten. Docs for formatting flags for redirected outputs. Docs for formatting flags for redirected outputs. Neaten. Neaten. Neaten. Valgrind findings. --no-flush in addition to --no-fflush (typo tolerance). Neaten. Todo. Todo. 4.5.0. Typo.
4.4.013 Aug 2016 03:15 minor feature: Mlr step -a shift allows you to place the previous record's values alongside the current record's values: http://johnkerl.org/miller/doc/reference.html#step. Mlr head, when used without the group-by flag (-g), stops after the specified number of records has been output. For example, even with a multi-gigabyte data file, mlr head -n 10 hugefile.dat will complete quickly after producing the first ten records from the file. The sec2gmtdate verb, and sec2gmtdate function for filter/put, is new: please see http://johnkerl.org/miller/doc/reference.html#sec2gmtdate and http://johnkerl.org/miller/doc/reference.html#Functions_for_filter_and_put. Sec2gmt and sec2gmtdate both leave non-numbers as-is, rather than formatting them as (error). This is particularly relevant for formatting nullable epoch-seconds columns in SQL-table output: if a column value is NULL then after sec2gmt or sec2gmtdate it will still be NULL. The dot operator has been universalized to work with any data type and produce a string. For example, if the field n has integers, then instead of typing mlr put ' name = "value:".string( n)' you can now simply domlr put ' name = "value:". n'. This is particularly timely for creating filenames for redirected print/dump/tee/emit output. The online documents now have a copy of the Miller manpage: http://johnkerl.org/miller/doc/manpage.html. inside filter/put, x=="" was distinct from isempty( x). This was nonsensical; now both are the same.
4.3.004 Jul 2016 06:45 minor feature: Interpolated percentiles are now available using mlr stats1 -i or mlr merge-fields -i. Non-interpolated percentiles are the default. The former resemble R's type=7 quantiles and the latter resemble R's type=1 quantiles. See also http://johnkerl.org/miller/doc/reference.html#stats1 and http://johnkerl.org/miller/doc/reference.html#merge-fields. Markdown-tabular output format is now available using --omd: please see http://johnkerl.org/miller/doc/file-formats.html#Markdown_tabular and #106. For files using CSV input as well as CSV output, there is now a --quote-original option which outputs fields with quotes if they had them on input. The was-quoted flag isn't tracked on derived fields, e.g. if fields a and b were quoted on input, then in mlr put ' c = a. b the c field won't be quoted on output. As such, this option is most useful with mlr cut, mlr filter, etc. The use-case from the original feature request #77 (comment) is in trimming down a huge CSV file in order to facilitate subsequent in-memory processing using spreadsheet software. The cookbook at http://johnkerl.org/miller/doc/cookbook.html has been extended significantly. You can now set a MLR_CSV_DEFAULT_RS=lf environment variable if you're tired of always putting --rs lf arguments for your CSV files: http://johnkerl.org/miller/doc/file-formats.html#CSV/TSV/etc. The printn and eprintn commands for mlr put are identical to print and eprint except they don't print final newlines. It is now an error if boundvars in the same for-loop expression have duplicate names, e.g. for (a,a in *) ... results in the error message mlr: duplicate for-loop boundvars "a" and "a". The strptime function would announce an internal coding error on malformed format strings; now, it correctly points out the user-level error. Percentiles in merge-fields were not working. This was ; also, the lacking unit-test cases which would have caught this sooner have been filled in. Miller's CSV output-quoting was non-RFC-compliant: double-q
4.2.021 Jun 2016 03:15 minor feature: Doc neaten. 4.1.0 release-specific docs. Post-4.1.0. Todo. Todo. Todo. Mlr@gentoo link. Sync cover-page examples to manpage USAGE EXAMPLES section. Two-pass cookbook example. Doc neaten. Todo. Comment. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Ast-print wording change (bigdiff for regtest expected-output file). Lashed-emit iterate. AST simplify for emit/emitp. AST simplify for emit/emitp. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Lashed-emit iterate. Todo. Todo. Todo. Lashed-emit iterate. Lashed-emit iterate UT. Lashed-emit iterate UT. Canonical lashed-emit UTs. Lashed-emit UTs. Lashed-emit docs. Update homebrew link. Todo. Valgrind findings. Neaten. More lashed-emit UT cases. Todo. Put -f to covers x 3. Todo. Doc neaten. Todo. 4.2.0. 4.2.0. 4.2.0. 4.2.0. Todo.
4.1.012 Jun 2016 11:45 minor feature: For-loops over key-value pairs in stream records and out-of-stream variables. Loops using while and do while . Break and continue in for, while, and do while loops. If-elif-else statements. Nestability of all the above, as well as of existing pattern-action blocks. Computable field names using square brackets, e.g. a. b = a b . Type-predicate functions: isnumeric, isint, isfloat, isbool, isstring . Commenting using pound signs. The new print and eprint allow formatting of arbitrary expressions to stdout/stderr, respectively. In addition to the existing dump which formats all out-of-stream variables to stdout as JSON, the new edump does the same to stderr. Semicolon is no longer required after closing curly brace. Emit @ and unset @ are new synonyms for emit all and unset all. . Unset now exists. Mlr -n is synonymous with mlr --from /dev/null, which is useful in dataless contexts wherein all your put statements are contained within begin/end blocks. in 4.0.0, mlr put -v '@a 1 2 = b; new=@a 1 2 ' mydata.tbl would crash with a memory-management error. Http://johnkerl.org/miller/doc/reference.html#If-statements_for_put. Http://johnkerl.org/miller/doc/reference.html#While_and_do-while_loops_for_put. Http://johnkerl.org/miller/doc/reference.html#For-loops_for_put. Http://johnkerl.org/miller/doc/reference.html#Field_names_for_filter. Http://johnkerl.org/miller/doc/reference.html#Field_names_for_put. Http://johnkerl.org/miller/doc/reference.html#Functions_for_filter_and_put. Http://johnkerl.org/miller/doc/reference.html#Semicolons,_newlines,_and_curly_braces_for_put. Http://johnkerl.org/miller/doc/cookbook.html.
4.0.010 May 2016 13:45 minor feature: Http://johnkerl.org/miller/doc/reference.html#put. Http://johnkerl.org/miller/doc/reference.html#Out-of-stream_variables_for_put. Http://johnkerl.org/miller/doc/reference.html#Pattern-action_blocks_for_put. Http://johnkerl.org/miller/doc/reference.html#Begin/end_blocks_for_put. Http://johnkerl.org/miller/doc/reference.html#Indexed_out-of-stream_variables_for_put. Http://johnkerl.org/miller/doc/reference.html#Emit_statements_for_put. Http://johnkerl.org/miller/doc/reference.html#Unset_statements_for_put. Http://johnkerl.org/miller/doc/cookbook.html#Using_out-of-stream_variables. Compound assignment operators such as +=,
3.5.005 Apr 2016 03:17 minor feature: Mlr nest is a companion to mlr reshape which was introduced in Miller 3.4.0: it allows unpacking key-value pairs which are nested within field values, and repacking them. Please see http://johnkerl.org/miller/doc/reference.html#nest. Mlr shuffle is a simple output-record permutor: http://johnkerl.org/miller/doc/reference.html#shuffle. Mlr repeat can be used as a data-generator, to expand a few input records (or even a single one) into arbitrarily many. This is particularly useful in conjunction with pseudorandom-number generators. As well, it can be used to reconstruct individual samples from data which have been count-aggregated, so that statistics such as mode, percentiles, etc. may be computed on them. Please see http://johnkerl.org/miller/doc/reference.html#repeat. Mlr put and mlr filter now accept a -f filename option, so that the DSL expression may be placed within a file instead of being typed out on the command line when desired. Please see http://johnkerl.org/miller/doc/reference.html#put and http://johnkerl.org/miller/doc/reference.html#filter. Put/filter DSL string literals now may include t, ", etc.: e.g. mlr put ' out = left. " t". right'. There is now a typeof function for the put/filter DSLs: mlr put ' xtype = typeof( x)'. This is occasionally useful for deging type-conversion questions. You may now do mlr --nr-progress-mod 1000000... to get something printed to stderr every 1000000th input record, and so on. For long-running aggregations on large input file(s), this can provide reassurance that processing is indeed proceeding apace. Example: Mlr cat -n had a wherein it counted zero-up while its documentation claimed it counted one-up. Now it counts one-up as documented.
3.4.015 Feb 2016 03:15 minor feature: JSON is now a supported format for input and output. Miller handles tabular data, and JSON supports arbitrarily deeply nested data structures, so if you want general JSON processing you should use jq. But if you have tabular data represented in JSON then Miller can now handle that for you. Please see the reference page and the FAQ. Reshape is a standard data-processing idiom, now available in Miller: http://johnkerl.org/miller/doc/reference.html#reshape. Incidentally (not part of this release, but new since the last release) Miller is now available in FreeBSD's package manager: https://www.freshports.org/textproc/miller/. A full list of distributions containing Miller may be found here. Miller is not yet available from within Fedora/CentOS, but as a step toward this goal, an SRPM is included in this release (see file-list below). Regex captures 0 through 9: http://johnkerl.org/miller/doc/reference.html#Regex_captures. Ternary operator in expression right-hand sides: e.g. mlr put ' y = x 0.5 ? 0 : 1'. Boolean literals true and false. Final semicolon is now allowed: e.g. mlr put ' x=1; y=2;'. Environment variables are now accessible, where environment-variable names may be string literals or arbitrary expressions: mlr put ' home = ENV "HOME" ' or mlr put ' value = ENV name '. While records are still string-to-string maps for input and output, and between then statements, types are preserved between multiple statements within a put. Example: mlr put ' y = string( x); z = y. y' works as expected, without requring mlr put ' y = string( x); z = string( y). string( y)' as before. Mixed-format join, e.g. CSV file joined with DKVP file, was incorrectly computing default separators (IRS, IFS, IPS). This resulted in records not being joined together. Segmentation violation on non-standard-input read of files with size an exact multiple of page size and not ending in IRS, e.g. newline. (This is less of a corner case than it sounds: for example, leave a long-running pro
3.3.212 Jan 2016 06:05 minor feature: Bootstrap sampling in mlr bootstrap: http://johnkerl.org/miller/doc/reference.html#bootstrap. Compare to reservoir sampling in mlr sample: http://johnkerl.org/miller/doc/reference.html#sample. Exponentially weighted moving averages in mlr step -a ewma: principally useful for smoothing of noisy time series, e.g. finely sampled system-resource utilization to give one of many possible examples. Please see http://johnkerl.org/miller/doc/reference.html#step. "Horizontal" univariate statistics in mlr merge-fields, compared to mlr stats which is "vertical". Also allows collapsing multiple fields into one, such as in_bytes and out_bytes data fields summing to bytes_sum. This can also be done easily using mlr put. However, mlr merge-fields allows aggregation of more than just a pair of field names, and supports pattern-matching on field names. Please see http://johnkerl.org/miller/doc/reference.html#merge-fields for more information. isnull and isnotnull functions for mlr filter and mlr put. stats1, stats2, merge-fields, step, and top correctly handle not only missing fields (in the row-heterogeneous-data case) but also null-valued fields. Minor memory-management improvements.
3.2.230 Dec 2015 03:16 minor feature: RFC-CSV read performance is dramatically improved and is now on par with other formats; read performance for all formats is slightly improved as well. Variable names can now be escaped, using curly braces if there are special characters in the input-data field names. Example: mlr put ' bytes.total = bytes.in + bytes.out '. See also #77 where this was requested. Compressed I/O is now supported, using built-in compatibility with local system tools: http://johnkerl.org/miller/doc/reference.html#Compression. See also #77 where this was requested. mlr uniq is now streaming (bounded memory use, functionality in tail -f contexts) when possible: i.e. when -n and -c are not specified. Thorough valgrind-driven testing has been used to tighten memory usage. This is mostly an invisible internal improvement, although it has a slight across-the-board performance improvement as well as allowing Miller to handle even larger files in limited-memory contexts.
3.1.210 Dec 2015 03:16 minor feature: Miller 3.1.1 release docs. Post-3.1.1. Neaten. Neaten. Neaten. Csv-read performance iterate. Csv-read performance iterate. Csv-read performance iterate. Csv-read performance iterate. Csv-read performance iterate. Reg_test/run neaten. Csv-read performance iterate. Csv-read performance iterate. Csv-read performance iterate. Csv-read performance iterate. Neaten. Csv-read performance iterate. Csv-read performance iterate. Neaten. . Miller 3.1.2. Miller 3.1.2.
3.1.006 Dec 2015 03:15 minor feature: Portability (affecting the CSV-RFC reader) for the Debian packaging request: https://.debian.org/cgi-bin/report.cgi?=800074. The latter greatly increases the number of platforms on which Miller has been validated. Mlr decimate: http://johnkerl.org/miller/doc/reference.html#decimate. Integer-preservation feature for mlr top and mlr stats1 with percentiles: If inputs are integers then corresponding outputs will be so as well (unless -F, which forces all-float output). Mlr histogram now has a --auto option for autocomputing lower and upper limits: http://johnkerl.org/miller/doc/reference.html#histogram. Mlr uniq and mlr count-distinct now have a -n flag to show only the counts of distinct values, rather than listing all distinct values: http://johnkerl.org/miller/doc/reference.html#uniq http://johnkerl.org/miller/doc/reference.html#count-distinct. The strlen function correctly handles UTF-8 string data.
3.0.101 Dec 2015 13:25 minor feature: Miller has always supported scientific notation in field values, e.g x=1e6. However, it had never supported scientific notation in DSL literals, e.g. mlr put ' y = x + 1e6. This release that. Additionally, mlr bar now has a ---auto flag which holds all records in memory and computes limits from the data, so you don't have to compute them separately and pass them in via --lo and --hi.
2.3.228 Oct 2015 05:25 minor feature: Mlr stats1 and stats2 now support a -s feature in which means, linear regressions, etc. evolve record-by-record as new records appear over time. This is particularly useful in tail -f contexts. See also http://johnkerl.org/miller/doc/reference.html#stats1 and http://johnkerl.org/miller/doc/reference.html#stats2. Mlr filter now supports a -x flag to negate the sense of the filter: instead of editing logic expressions e.g. from mlr filter ' x 10 x 20' to mlr filter ' x = 10 x 20'. See also http://johnkerl.org/miller/doc/reference.html#filter. In the event a CSV file lacks header lines, you can use mlr --implicit-csv-header to add positional header 1,2,3.... You can also convert those to desired text using mlr label. See also http://johnkerl.org/miller/doc/reference.html#label. Heterogeneity support is improved for sort, stats1, stats2, step, head, tail, top, sample, uniq, and count-distinct. See also #79. Mlr stats2 now has a logistic-regression feature, but I recommend treating it as experimental until some numerical-stability involving my naïve Newton-Raphson solver are worked out -- namely, it doesn't converge in all cases.
2.3.122 Oct 2015 06:45 minor feature: Miller 2.3.0. Miller 2.3.0. Logistic regression iterate. Todo. Neaten. Mlr top -a. Mlr top -a. Top -a. Miller 2.3.1.
2.3.018 Oct 2015 22:25 minor feature: Http://johnkerl.org/miller/doc/reference.html#Regular_expressions. Http://johnkerl.org/miller/doc/reference.html#put. Http://johnkerl.org/miller/doc/reference.html#filter. Http://johnkerl.org/miller/doc/reference.html#having-fields. Http://johnkerl.org/miller/doc/reference.html#cut. Http://johnkerl.org/miller/doc/reference.html#rename. Initial delta for mlr step -a delta is now 0, matching initial 1 for mlr step -a ratio . Usage messages consistently go to stdout when asked for via -h, and stderr in case of command-line syntax errors. Online help is confined to 80-character column width, except for mlr -f which is all single-line greppable. Header/data length mismatch error messages for CSV/CSV-lite now include file/line context.
2.2.124 Sep 2015 13:45 minor feature: Add unistd.h for access() and add return type for function. Start autoconf/automake based build system. Put autotools support files in their own directory. Two automake warnings. Remove AM_PROG_CC_C_O, not necessary any longer. Merge branch 'master' of https://github.com/johnkerl/miller. Automake: Run four test programs. Automake: add remaining unit tests. Automake: survive 'make distcheck'. Automake: bump version to 2.1.1. Automake: distribute README and LICENSE. Automake: build, distribute and install man page. Automake:gitignore: ignore distribution tarballs. Automake: handle lemon special files better. Automake: add dependency for man page on source file. Merge branch 'master' of https://github.com/johnkerl/miller. Automake: Update source lists for recent changes. Automake: remove dummy email address completely. .gitignore: update for test name change. Merge branch 'master' of https://github.com/johnkerl/miller. Automake: bump version to 2.1.3. Add missing source file. Automake: remove mlrvers.h, use version from configure.ac. Merge branch 'master' of https://github.com/johnkerl/miller. Automake: remove removed file from sources. Merge branch 'master' of https://github.com/johnkerl/miller. Automake: adapt for changes in 2.1.4. Automake: add most standalone mains. Automake: convert more custom targets to automake. Allow out-of-dir builds for test/run. Typo in comment. Automake: comment out two old programs. Automake: Descend into c/test and run test from there. Merge branch 'master' of https://github.com/johnkerl/miller. Let test/run work either with or without autoconfig. Proper invocations for test/run, for autoconf/non-autoconf. Proper invocations for test/run, for autoconf/non-autoconf. Neaten. Test/output/out should not be in source control. Merge branch 'master' of https://github.com/johnkerl/miller. Automake: update for changes on master. Automake: do not provide.tar.xz for now. Automake: various for c/dsls/Makefile.am. Merge branch 'master' of https://gi
2.2.021 Sep 2015 03:16 minor feature: Changing to sh. Update to sh. Update mkplots.sh. v2.1.4. Csv/stdin. Let test/run execute outside of its pwd (from 0-wiz-0/miller0. Add overdue unit-test case for reading from stdin. Neaten. Remove unused reference to memcheck. Neaten. Valgrind wasn't being used in any automated way. Let test/run write to./output for VPATH builds. Test/README.md. Allow ORS/OFS/OPS to be multi-char. Neaten. Update getlines profiler/comparator. Update getlines profiler/comparator. Neaten. Neaten. Todo. Neaten. Multi-char-separator options for CSV. Let test/run work either with or without autoconfig. Proper invocations for test/run, for autoconf/non-autoconf. Neaten. on-line help for separators. UT cases for CSV formattting options. Neaten. Iterate on read-performance expe
2.1.412 Sep 2015 09:25 minor feature: v2.1.3. add static-build option for i686 buildbox. read performance iterate peek-file-reader v2. Merge branch 'master' of http://github.com/johnkerl/miller. doc links. read performance iterate peek-file-reader v2. read performance iterate internal-API reorg for peek-file-reader it?. read performance iterate peek-file-reader iterate. Take CC from environment for Travis gcc/clang build. Take CC from environment for Travis gcc/clang build. read performance iterate CSV-parser integration. read performance iterate CSV-parser integration. read performance iterate CSV performance tuning. read performance iterate CSV performance tuning. neaten. read-performance iterate peek-file-reader UTs. read-performance iterate peek-file-reader UTs. read-performance iterate ring-buffer iterate. read-performance iterate ring-buffer iterate. read-performance iterate ring-buffer complete. read-performance iterate ring-buffer complete. read-performance iterate old-to-new CSV cutover. CSV read performance:
2.1.307 Sep 2015 12:45 minor feature: Shrink data-file sizes. Shrink data-file sizes.
2.1.107 Sep 2015 05:05 minor feature: Read performance iterate. Read-performance iterate. Read performance iterate. Update initial condition for step -a ratio. Todo. Neaten.
2.1.001 Sep 2015 06:45 minor feature: Todo. csv iterate doc updates. Todo. UTF-8 alignment for pprint and xtab formats. Neaten. Todo. Todo.txt. read-opt experiment iterate. Dhms iterate. Cast is*/to arguments to proper type. Dhms et al. iterate. Dhms et al. iterate. Dhms et al. iterate. Dhms et al. iterate. Dhms et al. iterate. Dhms et al. iterate. Dhms et al. iterate. Dhms et al. iterate. Dhms et al., resolving #35. Dhms et al., resolving #35. Starting basic travis support. Changed to container-based and do only make. Trying to do gcc and clang. Added a manpage for mlr. Makefile typos. Add comments. Manpage usage note. Manpage usage notes. Merge. Wiki link from top-level readme. Test/input - indir in test/run in all cases. Test/input - indir in test/run in all cases. Test/input - indir in test/run in all cases. Test/input - indir in test/run in all cases. Missing-lf iterate. Missing-lf iterate. Missing-lf iterate. sh test/run or sh./test/run, neither should cause a spurious regress?. sh test/run or sh./test/run, neither should cause a spurious regress?. Missing-lf iterate. mmapped I/O with missing final newline. Iterating on Allow trailing spaces with --allow-repeat-ifs. . mmapped I/O with missing final newline. Read performance iterate. Revert "read performance iterate". . Read-performance iterate. Read-performance iterate. Read-performance iterate. Travis-test trigger via trivial commit. Add Travis Build Status Image to Readme.
2.0.028 Aug 2015 03:15 minor feature: --csv will be still be compliant by default, but RS/FS will be programmable: you'll be able to handle TSV or what have you, with double-quote support. RS/FS/PS for all formats will be able to be multi-character, e.g. you'll be able to use CRLF for DKVP format which will resolve #19. Read-performance for CSV will be optimized for performance. Double-quoting will be supported in DKVP as well as in CSV.
1.0.125 Aug 2015 00:25 minor feature: HN feedbacks. Skeleton for full-rfc-csv i/o, preserving existing csvlite functional?. Internationalization note. Command-line-configurable install dir. Csv-rfc.txt. csv iterate string-builder iterate. csv iterate string-builder iterate. Use CCOMP=gcc in c/dsls/Makefile as well as c/Makefile. Use CCOMP=gcc in c/dsls/Makefile as well as c/Makefile. Doc update re releases.
1.0.020 Aug 2015 14:25 major feature: Initial public release after reaching feature-stability for the author. Primary upcoming work for upcoming releases involves configuration (autotools et al.), packaging (homebrew, .deb), and RFC-compliant CSV.
ManageYou can also help out here by:
← Update project
or flagging this entry for moderator attention.