immutable auto helpTextVerbose
Synopsis: tsv-filter [options] [file...]
Filter lines of tab-delimited files via comparison tests against fields.
Multiple tests can be specified, by default they are evaluated as an AND
clause. Lines satisfying the tests are written to standard output.
Typical test syntax is '--op field:value', where 'op' is an operator,
'field' is a either a field name and or field number, and 'value' is the
comparison basis. For example, '--lt length:500' tests if the 'length'
field is less than 500. A more complete example:
tsv-filter --header --gt length:50 --lt length:100 --le width:200 data.tsv
This outputs all lines from file data.tsv where the 'length' field is
greater than 50 and less than 100, and the 'width' field is less than or
equal to 200. The header line is also output.
Field numbers can also be used to identify fields, and must be used when
the input file doesn't have a header line. For example:
tsv-filter --gt 1:50 --lt 1:100 --le 2:200 data.tsv
Field lists can be used to specify multiple fields at once. For example:
tsv-filter --not-blank 1-10 --str-ne 1,2,5:'--' data.tsv
tests that fields 1-10 are not blank and fields 1,2,5 are not "--".
Wildcarded field names can also be used to specify multiple fields. The
following finds lines where any field name ending in '*_id' is empty:
tsv-filter -H --or --empty '*_id'
Use '--help-fields' for details on using field names.
Tests available include:
* Test if a field is empty (no characters) or blank (empty or whitespace only).
* Test if a field is interpretable as a number, a finite number, NaN, or Infinity.
* Compare a field to a number - Numeric equality and relational tests.
* Compare a field to a string - String equality and relational tests.
* Test if a field matches a regular expression. Case sensitive or insensitive.
* Test if a field contains a string. Sub-string search, case sensitive or insensitive.
* Test a field's character or byte length.
* Field to field comparisons - Similar to the other tests, except comparing
one field to another in the same line.
As an alternative to filtering, records can be marked to indicate if they meet
the filter criteria or not. For example, the following will add a field to each
record indicating if the 'Color' field is a primary color.
tsv-filter -H --or --str-eq Color:Red --str-eq Color:Yellow str-eq Color:Blue \
--label IsPrimaryColor data.tsv
Values default to '1' and '0' and can be changed using '--label-values'. The
header name pass to '--label' is ignored if headers are not being used.
* The run is aborted if there are not enough fields in an input line.
* Numeric tests will fail and abort the run if a field cannot be interpreted as a
number. This includes fields with no text. To avoid this use '--is-numeric' or
'--is-finite' prior to the numeric test. For example, '--is-numeric 5 --gt 5:100'
ensures field 5 is numeric before running the --gt test.
* Regular expression syntax is defined by the D programming language. They follow
common conventions (perl, python, etc.). Most common forms work as expected.
* Output is buffered by default to improve performance. Use '--line-buffered' to
have each matched line immediately written out.