randomizeLines

Randomize all the lines in files or standard input.

All lines in files and/or standard input are read in and written out in random order. Both simple random sampling and weighted sampling are supported.

Input data size is limited by available memory. Disk oriented techniques are needed when data sizes are larger. For example, generating random values line-by-line (ala --gen-random-inorder) and sorting with a disk-backed sort program like GNU sort.

This approach is significantly faster than reading line-by-line with a heap the way reservoir sampling does, effectively acknowledging that both approaches need to read all data into memory when randomizing all lines.

Note: The unweighted case could be sped up by using std.random.randomShuffle from the D standard library. This uses an O(n) swapping algorithm to perform the shuffle rather than the O(n log n) sort approach used here. The downsides are that the result order would not be consistent with the other routines and that random number printing does not make sense. Order consistency matters only in the rare case when multiple randomizations are being done with the same static seed.

void
randomizeLines
(
Flag!"isWeighted" isWeighted
OutputRange
)
(,
auto ref OutputRange outputStream
)
if (
isOutputRange!(OutputRange, char)
)

Meta