Command line tool for randomizing or sampling lines from input streams. Several sampling methods are available, including simple random sampling, weighted random sampling, Bernoulli sampling, and distinct sampling.
Copyright (c) 2017-2019, eBay Software Foundation Initially written by Jon Degenhardt
HasRandomValue is a boolean flag used at compile time by identifyFileLines to distinguish use cases needing random value assignments from those that don't.
Bernoulli sampling of lines from the input stream.
Invokes the appropriate Bernoulli sampling routine based on the command line arguments.
bernoulliSkipSampling is an implementation of Bernoulli sampling using skips.
Sample a subset of lines by choosing a random set of values from key fields.
Write a floating point random value to an output stream.
Generates weighted random values for all input lines, preserving input order.
identifyFileLines is used by algorithms that read all files into memory prior to processing. It does the initial processing of the file data.
This routine is invoked when all input lines are being randomized. It selects the appropriate function and template instantiation based on the command line arguments.
Randomize all the lines in files or standard input using a shuffling algorithm.
Randomize all the lines in files or standard input using assigned random weights and sorting.
Reservoir sampling via Algorithm R
Invokes the appropriate reservoir sampling routine based on the command line arguments.
Reservoir sampling using a heap. Both weighted and unweighted random sampling are supported.
Simple random sampling with replacement.
Invokes the appropriate sampling routine based on the command line arguments.
A container and reader of data from a file or standard input.
An InputLine array is returned by identifyFileLines to represent each non-header line line found in a FileData array. The 'data' element contains the line. A 'randomValue' line is included if random values are being generated.
Container for command line options and derived data.