helpTextVerbose
auto helpTextVerbose =
q"EOS
Synopsis: tsv-summarize [options] file [file...]
tsv-summarize reads tabular data files (tab-separated by default), tracks
field values for each unique key, and runs summarization algorithms. Consider
the file data.tsv:
make color time
ford blue 131
chevy green 124
ford red 128
bmw black 118
bmw black 126
ford blue 122
The min and average times for each make is generated by the command:
$ tsv-summarize --header --group-by 1 --min 3 --mean 3 data.tsv
This produces:
make time_min time_mean
ford 122 127
chevy 124 124
bmw 118 122
Using '--group 1,2' will group by both 'make' and 'color'. Omitting the
'--group-by' entirely summarizes fields for full file.
The program tries to generate useful headers, but custom headers can be
specified. Example (using -g and -H shortcuts for --header and --group-by):
$ tsv-summarize -H -g 1 --min 3:fastest --mean 3:average data.tsv
Most operators take custom headers in a similarly way, generally following:
--<operator-name> FIELD[:header]
Operators can be specified multiple times. They can also take multiple
fields (though not when a custom header is specified). Examples:
--median 2,3,4
--median 2-5,7-11
The quantile operator requires one or more probabilities after the fields:
--quantile 2:0.25 // Quantile 1 of field 2
--quantile 2-4:0.25,0.5,0.75 // Q1, Median, Q3 of fields 2, 3, 4
Summarization operators available are:
count range mad values
retain sum var unique-values
first mean stddev unique-count
last median mode missing-count
min quantile mode-count not-missing-count
max
Numeric values are printed to 12 significant digits by default. This can be
changed using the '--p|float-precision' option. If six or less it sets the
number of significant digits after the decimal point. If greater than six it
sets the total number of significant digits.
Calculations hold onto the minimum data needed while reading data. A few
operations like median keep all data values in memory. These operations will
start to encounter performance issues as available memory becomes scarce. The
size that can be handled effectively is machine dependent, but often quite
large files can be handled.
Operations requiring numeric entries will signal an error and terminate
processing if a non-numeric entry is found.
Missing values are not treated specially by default, this can be changed
using the '--x|exclude-missing' or '--r|replace-missing' option. The former
turns off processing for missing values, the latter uses a replacement value.
Options:
EOS";
tsv_summarize
classesfunctionsinterfacesstructsvariables