helpText
immutable
auto helpText =
q"EOS
Synopsis: tsv-sample [options] [file...]
Sample input lines or randomize their order. Several modes of operation
are available:
* Line order randomization (the default): All input lines are output in a
random order. All orderings are equally likely.
* Weighted line order randomization (--w|weight-field): Lines are selected
using weighted random sampling, with the weight taken from a field.
Lines are output in weighted selection order, reordering the lines.
* Sampling with replacement (--r|replace, --n|num): All input is read into
memory, then lines are repeatedly selected at random and written out. This
continues until --n|num samples are output. Lines can be selected multiple
times. Output continues forever if --n|num is zero or not specified.
* Bernoulli sampling (--p|prob): A random subset of lines is output based
on an inclusion probability. This is a streaming operation. A selection
decision is made on each line as is it read. Line order is not changed.
* Distinct sampling (--k|key-fields, --p|prob): Input lines are sampled
based on the values in the key field. A subset of the keys are chosen
based on the inclusion probability (a 'distinct' set of keys). All lines
with one of the selected keys are output. Line order is not changed.
The '--n|num' option limits the sample size produced. It speeds up line
order randomization and weighted sampling significantly. It is also used
to terminate sampling with replacement.
Use '--help-verbose' for detailed information.
Options:
EOS";
tsv_utils tsv_sample
aliasesfunctionsstatic variablesstructsvariables