helpTextVerbose
auto helpTextVerbose =
q"EOS
Synopsis: tsv-uniq [options] [file...]
tsv-uniq identifies equivalent lines in tab-separated value files. Input is read
line by line, recording a key based on one or more of the fields. Two lines are
equivalent if they have the same key. When operating in 'uniq' mode, the first
time a key is seen the line is written to standard output, but subsequent lines
are discarded. This is similar to the unix 'uniq' program, but based on individual
fields and without requiring sorted data. This command uniq's on fields 2 and 3:
tsv-uniq -f 2,3 file.tsv
The alternate to 'uniq' mode is 'equiv-class' identification. In this mode, all
lines are written to standard output, but with a new field added marking
equivalent entries with an ID. The ID is simply a one-upped counter. Example:
tsv-uniq --header -f 2,3 --equiv file.tsv
tsv-uniq can be run without specifying a key field. In this case the whole line
is used as a key, same as the Unix 'uniq' program. This works on any line-oriented
text file, not just TSV files.
The '--r|repeated' option can be used to print only lines occurring more than
once. '--a|at-least N' is similar, except that it only prints lines occuring at
least N times. For both, the Nth line found is printed, in the order found.
The '--m|max MAX' option changes the behavior to output the first MAX lines for
each key, rather than just the first line for each key. This can also with used
with '--e|equiv' to limit the number output for each equivalence class.
It's not obvious when both '--a|at-least' and '--m|max' might be useful, but, if
both are specified, the occurrences between 'at-least' and 'max' are output.
Options:
EOS";
tsv_uniq
functionsstructsvariables