Synopsis: tsv-uniq [options] [file...]
tsv-uniq identifies equivalent lines in tab-separated value files. Input
is read line by line, recording a key for each line based on one or more
of the fields. Two lines are equivalent if they have the same key. The
first time a key is seen its line is written to standard output.
Subsequent lines containing the same key are discarded. This command
uniq's a file on fields 2 and 3:
tsv-uniq -f 2,3 file.tsv
This is similar to the Unix 'uniq' program, but based on individual
fields and without requiring sorted data.
Field names can be used if the input file has a header line. This command
uniq's a file based on the 'time' and 'date' fields:
tsv-uniq -H -f time,date file.tsv
Use '--help-fields' for details about field names.
tsv-uniq can be run without specifying a key field. In this case the
whole line is used as a key, same as the Unix 'uniq' program. This works
on any line-oriented text file, not just TSV files.
The above is the default behavior ('uniq' mode). The alternates to 'uniq'
mode are 'number' mode and 'equiv-class' mode. In 'equiv-class' mode, all
lines are written to standard output, but with a field appended marking
equivalent entries with an ID. The ID is a one-upped counter. Example:
tsv-uniq --header -f 2,3 --equiv file.tsv
'Number' mode also writes all lines to standard output, but with a field
appended numbering the occurrence count for the line's key. The first line
with a specific key is assigned the number '1', the second with the key is
assigned number '2', etc. 'Number' and 'equiv-class' modes can be combined.
The '--r|repeated' option can be used to print only lines occurring more
than once. Specifically, the second occurrence of a key is printed. The
'--a|at-least N' option is similar, printing lines occurring at least N
times. (Like repeated, the Nth line with the key is printed.)
The '--m|max MAX' option changes the behavior to output the first MAX
lines for each key, rather than just the first line for each key.
If both '--a|at-least' and '--m|max' are specified, the occurrences
starting with 'at-least' and ending with 'max' are output.