helpTextVerbose
auto helpTextVerbose =
q"EOS
Synopsis: tsv-join --filter-file file [options] [file...]
tsv-join matches input lines (the 'data stream') against lines from a
'filter' file. The match is based on exact match comparison of one or more
'key' fields. Fields are TAB delimited by default. Input lines are read
from files or standard input. Matching lines are written to standard
output, along with any additional fields from the filter file that have
been specified. For example:
tsv-join --filter-file filter.tsv --key-fields 1 --append-fields 5,6 data.tsv
This reads filter.tsv, creating a hash table keyed on field 1. Lines from
data.tsv are read one at a time. If field 1 is found in the hash table,
the line is written to standard output with fields 5 and 6 from the filter
file appended. In database parlance this is a "hash semi join". Note the
asymmetric relationship: Records in the filter file should be unique, but
lines in the data stream (data.tsv) can repeat.
Field names can be used instead of field numbers if the files have header
lines. The following command is similar to the previous example, except
using field names:
tsv-join -H -f filter.tsv -k ID --append-fields Date,Time data.tsv
tsv-join can also work as a simple filter based on the whole line. This is
the default behavior. Example:
tsv-join -f filter.tsv data.tsv
This outputs all lines from data.tsv found in filter.tsv.
Multiple fields can be specified as keys and append fields. Field numbers
start at one, zero represents the whole line. Fields are comma separated
and ranges can be used. Example:
tsv-join -f filter.tsv -k 1,2 --append-fields 3-7 data.tsv
The --e|exclude option can be used to exclude matched lines rather than
keep them.
The joins supported are similar to the "stream-static" joins available in
Spark Structured Streaming and "KStream-KTable" joins in Kafka. The filter
file plays the same role as the Spark static dataset or Kafka KTable.
Options:
EOS";
tsv_utils tsv_join
functionsstatic variablesstructsvariables