immutable auto helpText
Synopsis: tsv-sample [options] [file...]
Sample input lines or randomize their order. Several modes of operation
* Shuffling (the default): All input lines are output in random order. All
orderings are equally likely.
* Random sampling (--n|num N): A random sample of N lines are selected and
written to standard output. By default, selected lines are written in
random order. All sample sets and orderings are equally likely. Use
--i|inorder to write the selected lines in the original input order.
* Weighted random sampling (--n|num N, --w|weight-field F): A weighted
sample of N lines is produced. Weights are taken from field F. Lines are
output in weighted selection order. Use --i|inorder to write in original
input order. Omit --n|num to shuffle all lines (weighted shuffling).
* Sampling with replacement (--r|replace, --n|num N): All input lines are
read in, then lines are repeatedly selected at random and written out.
This continues until N lines are output. Individual lines can be written
multiple times. Output continues forever if N is zero or not provided.
* Bernoulli sampling (--p|prob P): A random subset of lines is selected
based on probability P, a 0.0-1.0 value. This is a streaming operation.
A decision is made on each line as it is read. Line order is not changed.
* Distinct sampling (--k|key-fields F, --p|prob P): Input lines are sampled
based on the values in the key fields. A subset of keys are chosen based
on the inclusion probability (a 'distinct' set of keys). All lines with
one of the selected keys are output. Line order is not changed.
Fields are specified using field number or field name. Field names require
that the input file has a header line.
Use '--help-verbose' for detailed information.