- bernoulliSampling
void bernoulliSampling(TsvSampleOptions cmdopt, OutputRange outputStream)
Bernoulli sampling on the input stream.
- bernoulliSamplingCommand
void bernoulliSamplingCommand(TsvSampleOptions cmdopt, OutputRange outputStream)
Bernoulli sampling on the input stream.
- bernoulliSkipSampling
void bernoulliSkipSampling(TsvSampleOptions cmdopt, OutputRange outputStream)
Undocumented in source. Be warned that the author may not have intended to support it.
- distinctSampling
void distinctSampling(TsvSampleOptions cmdopt, OutputRange outputStream)
Sample a subset of the unique values from the key fields.
- generateWeightedRandomValuesInorder
void generateWeightedRandomValuesInorder(TsvSampleOptions cmdopt, OutputRange outputStream)
Generates weighted random values for all input lines, preserving input order.
- getFieldValue
T getFieldValue(C[] line, size_t fieldIndex, C delim, string filename, size_t lineNum)
Undocumented in source. Be warned that the author may not have intended to support it.
- identifyFileLines
InputLine!hasRandomValue[] identifyFileLines(FileData[] fileData, TsvSampleOptions cmdopt, OutputRange outputStream)
identifyFileLines is used by algorithms that read all files into memory prior to
processing. It does the initial processing of the file data.
- main
int main(string[] cmdArgs)
Undocumented in source. Be warned that the author may not have intended to support it.
- randomizeLinesCommand
void randomizeLinesCommand(TsvSampleOptions cmdopt, OutputRange outputStream)
Randomize all the lines in files or standard input.
- randomizeLinesViaShuffle
void randomizeLinesViaShuffle(TsvSampleOptions cmdopt, OutputRange outputStream)
Randomize all the lines in files or standard input.
- randomizeLinesViaSort
void randomizeLinesViaSort(TsvSampleOptions cmdopt, OutputRange outputStream)
Randomize all the lines in files or standard input.
- reservoirSamplingAlgorithmR
void reservoirSamplingAlgorithmR(TsvSampleOptions cmdopt, OutputRange outputStream)
Reservoir sampling, Algorithm R
- reservoirSamplingCommand
void reservoirSamplingCommand(TsvSampleOptions cmdopt, OutputRange outputStream)
Reservoir sampling on the input stream.
- reservoirSamplingViaHeap
void reservoirSamplingViaHeap(TsvSampleOptions cmdopt, OutputRange outputStream)
Reservior sampling using a heap. Both weighted and unweighted random sampling are
supported.
- simpleRandomSamplingWithReplacement
void simpleRandomSamplingWithReplacement(TsvSampleOptions cmdopt, OutputRange outputStream)
Simple random sampling with replacement.
- testTsvSample
void testTsvSample(string[] cmdArgs, string[][] expected)
Undocumented in source. Be warned that the author may not have intended to support it.
- tsvSample
void tsvSample(TsvSampleOptions cmdopt, OutputRange outputStream)
Invokes the appropriate sampling routine based on the command line arguments.
Command line tool for randomizing or sampling lines from input streams. Several sampling methods are available, including simple random sampling, weighted random sampling, Bernoulli sampling, and distinct sampling.
Copyright (c) 2017-2018, eBay Software Foundation Initially written by Jon Degenhardt