Command line tool that identifies equivalent lines in an input stream. Equivalent
lines are identified using either the full line or a set of fields as the key. By
default, input is written to standard output, retaining only the first occurrence of
equivalent lines. There are also options for marking and numbering equivalent lines
rather, without filtering out duplicates.
This tool is similar in spirit to the Unix 'uniq' tool, with some key differences.
First, the key can be composed of individual fields, not just the full line. Second,
input does not need to be sorted. (Unix 'uniq' only detects equivalent lines when
they are adjacent, hence the usual need for sorting.)
There are a couple alternative to uniq'ing the input lines. One is to mark lines with
an equivalence ID, which is a one-upped counter. The other is to number lines, with
each unique key have its own set of numbers.
Copyright (c) 2015-2021, eBay Inc.
Initially written by Jon Degenhardt
Command line tool that identifies equivalent lines in an input stream. Equivalent lines are identified using either the full line or a set of fields as the key. By default, input is written to standard output, retaining only the first occurrence of equivalent lines. There are also options for marking and numbering equivalent lines rather, without filtering out duplicates.
This tool is similar in spirit to the Unix 'uniq' tool, with some key differences. First, the key can be composed of individual fields, not just the full line. Second, input does not need to be sorted. (Unix 'uniq' only detects equivalent lines when they are adjacent, hence the usual need for sorting.)
There are a couple alternative to uniq'ing the input lines. One is to mark lines with an equivalence ID, which is a one-upped counter. The other is to number lines, with each unique key have its own set of numbers.
Copyright (c) 2015-2021, eBay Inc. Initially written by Jon Degenhardt