tsv_uniq

Command line tool using fields in a tab-separated value file to identify equivalent lines. Can either remove the duplicate entries or mark as equivalence classes.

This tool reads a tab-separated value file line by line, using one or more fields to record a key. If the same key is found in a subsequent line, it is identified as equivalent. When operating in 'uniq' mode, the first time a key is seen the line is written to standard output, but subsequent matching lines are discarded.

The alternate to 'uniq' is 'equiv-class' identification. In this mode, all lines written to standard output, but a new field is added marking equivalent entries with with an ID. The ID is simply a one-upped counter.

Copyright (c) 2015-2018, eBay Software Foundation Initially written by Jon Degenhardt

Members

Functions

main
int main(string[] cmdArgs)

Main program. Processes command line arguments and calls tsvUniq which implements the main processing logic.

tsvUniq
void tsvUniq(TsvUniqOptions cmdopt, string[] inputFiles)

Outputs the unique lines from all the input files.

Structs

TsvUniqOptions
struct TsvUniqOptions

Container for command line options.

Variables

helpText
auto helpText;
Undocumented in source.
helpTextVerbose
auto helpTextVerbose;
Undocumented in source.

Meta