Undocumented in source.
auto helpTextVerbose = q"EOS Synopsis: tsv-summarize [options] file [file...] tsv-summarize reads tabular data files (tab-separated by default), tracks field values for each unique key, and runs summarization algorithms. Consider the file data.tsv: Make Color Time ford blue 131 chevy green 124 ford red 128 bmw black 118 bmw black 126 ford blue 122 The min and average times for each make is generated by the command: $ tsv-summarize --header --group-by Make --min Time --mean Time data.tsv This produces: Make Time_min Time_mean ford 122 127 chevy 124 124 bmw 118 122 Using '--group-by Make,Color' will group by both 'Make' and 'Color'. Omitting the '--group-by' entirely summarizes fields for the full file. The previous example uses field names to identify fields. Field numbers can be used as well. The next two commands are equivalent: $ tsv-summarize -H --group-by Make,Color --min Time --mean Time data.tsv $ tsv-summarize -H --group-by 1,2 --min 3 --mean 3 data.tsv The program tries to generate useful headers, but custom headers can be specified. Example (using -g and -H shortcuts for --header and --group-by): $ tsv-summarize -H -g 1 --min 3:Fastest --mean 3:Average data.tsv Most operators take custom headers in a similarly way, generally following: --<operator-name> FIELD[:header] Operators can be specified multiple times. They can also take multiple fields (though not when a custom header is specified). Examples: --median 2,3,4 --median 2-5,7-11 --median elapsed_time,system_time,user_time --median '*_time' # Wildcard. All fields ending in '_time'. The quantile operator requires one or more probabilities after the fields: --quantile run_time:0.25 # Quantile 1 of the 'run_time' field --quantile 2:0.25 # Quantile 1 of field 2 --quantile 2-4:0.25,0.5,0.75 # Q1, Median, Q3 of fields 2, 3, 4 Summarization operators available are: count range mad values retain sum var unique-values first mean stddev unique-count last median mode missing-count min quantile mode-count not-missing-count max Calculated numeric values are printed to 12 significant digits by default. This can be changed using the '--p|float-precision' option. If six or less it sets the number of significant digits after the decimal point. If greater than six it sets the total number of significant digits. Calculations hold onto the minimum data needed while reading data. A few operations like median keep all data values in memory. These operations will start to encounter performance issues as available memory becomes scarce. The size that can be handled effectively is machine dependent, but often quite large files can be handled. Operations requiring numeric entries will signal an error and terminate processing if a non-numeric entry is found. Missing values are not treated specially by default, this can be changed using the '--x|exclude-missing' or '--r|replace-missing' option. The former turns off processing for missing values, the latter uses a replacement value. Options: EOS";