HasRandomValue is a boolean flag used at compile time by identifyFileLines to distinguish use cases needing random value assignments from those that don't.
Bernoulli sampling of lines on the input stream.
Invokes the appropriate Bernoulli sampling routine based on the command line arguments.
bernoulliSkipSampling is an implementation of Bernoulli sampling using skips.
Sample a subset of the unique values from the key fields.
Write a floating point random value to an output stream.
Generates weighted random values for all input lines, preserving input order.
identifyFileLines is used by algorithms that read all files into memory prior to processing. It does the initial processing of the file data.
Invokes the appropriate routine to randomize input lines based on the command line arguments.
Randomize all the lines in files or standard input using a shuffling algorithm.
Randomize all the lines in files or standard input using assigned random weights and sorting.
Reservoir sampling via Algorithm R
Invokes the appropriate reservoir sampling routine based on the command line arguments.
Reservior sampling using a heap. Both weighted and unweighted random sampling are supported.
Simple random sampling with replacement.
Invokes the appropriate sampling routine based on the command line arguments.
A container and reader data form a file or standard input.
An InputLine array is returned by identifyFileLines to represent each non-header line line found in a FileData array. The 'data' element contains the line. A 'randomValue' line is included if random values are being generated.
Container for command line options.
Command line tool for randomizing or sampling lines from input streams. Several sampling methods are available, including simple random sampling, weighted random sampling, Bernoulli sampling, and distinct sampling.
Copyright (c) 2017-2018, eBay Software Foundation Initially written by Jon Degenhardt