randomizeLinesViaSort

Randomize all the lines in files or standard input using assigned random weights and sorting.

All lines in files and/or standard input are read in and written out in random order. This algorithm assigns a random value to each line and sorts. This approach supports both weighted sampling and simple random sampling (unweighted).

This is significantly faster than heap-based reservoir sampling in the case where the entire file is being read. See also randomizeLinesViaShuffle for the unweighted case, as it is a little faster, at the cost not supporting random value printing or compatibility-mode.

Input data size is limited by available memory. Disk oriented techniques are needed when data sizes are larger. For example, generating random values line-by-line (ala --gen-random-inorder) and sorting with a disk-backed sort program like GNU sort.

void

randomizeLinesViaSort

(

Flag!"isWeighted" isWeighted

OutputRange

)

(

TsvSampleOptions cmdopt

auto ref OutputRange outputStream

)

if (

isOutputRange!(OutputRange, char)

)

randomizeLinesViaSort

Meta

Source