InputFieldReordering

InputFieldReordering - Move select fields from an input line to an output array, reordering along the way.

The InputFieldReordering class is used to reorder a subset of fields from an input line. The caller instantiates an InputFieldReordering object at the start of input processing. The instance contains a mapping from input index to output index, plus a buffer holding the reordered fields. The caller processes each input line by calling initNewLine, splitting the line into fields, and calling processNextField on each field. The output buffer is ready when the allFieldsFilled method returns true.

Fields are not copied, instead the output buffer points to the fields passed by the caller. The caller needs to use or copy the output buffer while the fields are still valid, which is normally until reading the next input line. The program below illustrates the basic use case. It reads stdin and outputs fields [3, 0, 2], in that order. (See also joinAppend, below, which has a performance improvement over join used here.)

int main(string[] args)
{
    import tsv_utils.common.utils;
    import std.algorithm, std.array, std.range, std.stdio;
    size_t[] fieldIndicies = [3, 0, 2];
    auto fieldReordering = new InputFieldReordering!char(fieldIndicies);
    foreach (line; stdin.byLine)
    {
        fieldReordering.initNewLine;
        foreach(fieldIndex, fieldValue; line.splitter('\t').enumerate)
        {
            fieldReordering.processNextField(fieldIndex, fieldValue);
            if (fieldReordering.allFieldsFilled) break;
        }
        if (fieldReordering.allFieldsFilled)
        {
            writeln(fieldReordering.outputFields.join('\t'));
        }
        else
        {
            writeln("Error: Insufficient number of field on the line.");
        }
    }
    return 0;
}

Field indicies are zero-based. An individual field can be listed multiple times. The outputFields array is not valid until all the specified fields have been processed. The allFieldsFilled method tests this. If a line does not have enough fields the outputFields buffer cannot be used. For most TSV applications this is okay, as it means the line is invalid and cannot be used. However, if partial lines are okay, the template can be instantiated with EnablePartialLines.yes. This will ensure that any fields not filled-in are empty strings in the outputFields return.

Constructors

this
this(size_t[] inputFieldIndicies, size_t start)
Undocumented in source.

Members

Aliases

TupleFromTo
alias TupleFromTo = Tuple!(size_t, "from", size_t, "to")
Undocumented in source.

Functions

allFieldsFilled
bool allFieldsFilled()

allFieldsFilled returned true if all fields expected have been processed.

initNewLine
void initNewLine()

initNewLine initializes the object for a new line.

outputFields
C[][] outputFields()

outputFields is the assembled output fields. Unless partial lines are enabled, it is only valid after allFieldsFilled is true.

processNextField
size_t processNextField(size_t fieldIndex, C[] fieldValue)

processNextField maps an input field to the correct locations in the outputFields array.

Meta