Shuffling command handler. Invokes the appropriate shuffle (line order
randomization) routine based on the command line arguments.
Shuffling has similarities to random sampling, but the algorithms used are
different. Random sampling selects a subset, only the current subset selection
needs to be kept in memory. This is supported by reservoir sampling. By contrast,
shuffling needs to hold all input in memory, so it works better to read all lines
into memory at once and then shuffle.
Two different algorithms are used. Array shuffling is used for unweighted shuffling.
Sorting plus random weight assignments is used for weighted shuffling and when
compatibility mode is being used.
The algorithms used here are all limited by available memory.
Shuffling command handler. Invokes the appropriate shuffle (line order randomization) routine based on the command line arguments.
Shuffling has similarities to random sampling, but the algorithms used are different. Random sampling selects a subset, only the current subset selection needs to be kept in memory. This is supported by reservoir sampling. By contrast, shuffling needs to hold all input in memory, so it works better to read all lines into memory at once and then shuffle.
Two different algorithms are used. Array shuffling is used for unweighted shuffling. Sorting plus random weight assignments is used for weighted shuffling and when compatibility mode is being used.
The algorithms used here are all limited by available memory.