Bernoulli sampling on the input stream. Each input line is a assigned a random
value and output if less than the inclusion probability. The order of the lines
is not changed.
Note: Performance tests show that skip sampling is faster when the inclusion
probability is approximately 4-5% or less. A performance optimization would be to
create a separate function for cases when the probability is small and the random
weights are not being output with each line. A disadvantage would be that the
random weights assigned to each element would change based on the sampling. Printed
weights would no longer be consistent run-to-run.
Bernoulli sampling on the input stream. Each input line is a assigned a random value and output if less than the inclusion probability. The order of the lines is not changed.
Note: Performance tests show that skip sampling is faster when the inclusion probability is approximately 4-5% or less. A performance optimization would be to create a separate function for cases when the probability is small and the random weights are not being output with each line. A disadvantage would be that the random weights assigned to each element would change based on the sampling. Printed weights would no longer be consistent run-to-run.