tsv_utils.common.utils

Utilities used by tsv-utils applications. InputFieldReordering, BufferedOutputRange, and a several others.

Utilities in this file:

  • InputFieldReordering - A class that creates a reordered subset of fields from an input line. Fields in the subset are accessed by array indicies. This is especially useful when processing the subset in a specific order, such as the order listed on the command-line at run-time.
  • BufferedOutputRange - An OutputRange with an internal buffer used to buffer output. Intended for use with stdout, it is a significant performance benefit.
  • isFlushableOutputRange - Tests if something is an OutputRange with a flush member.
  • bufferedByLine - An input range that reads from a File handle line by line. It is similar to the standard library method std.stdio.File.byLine, but quite a bit faster. This is achieved by reading in larger blocks and buffering.
  • InputSourceRange - An input range that provides open file access to a set of files. It is used to iterate over files passed as command line arguments. This enable reading header line of a file during command line argument process, then passing the open file to the main processing functions.
  • ByLineSourceRange - Similar to an InputSourceRange, except that it provides access to a byLine iterator (bufferedByLine) rather than an open file. This is used by tools that run the same processing logic both header non-header lines.
  • isBufferableInputSource - Tests if a file or input range can be read in a buffered fashion by inputSourceByChunk.
  • inputSourceByChunk - Returns a range that reads from a file handle (File) or a ubyte input range a chunk at a time.
  • joinAppend - A function that performs a join, but appending the join output to an output stream. It is a performance improvement over using join or joiner with writeln.
  • getTsvFieldValue - A convenience function when only a single value is needed from an input line.
  • throwIfWindowsNewline - A utility for detecting Windows newlines in input.

Copyright (c) 2015-2020, eBay Inc. Initially written by Jon Degenhardt

Members

Aliases

EnablePartialLines
alias EnablePartialLines = Flag!"enablePartialLines"

Flag used by the InputFieldReordering template.

NewlineWasRemoved
alias NewlineWasRemoved = Flag!"newlineWasRemoved"

Yes|No.newlineWasRemoved is a template parameter to throwIfWindowsNewline. A Yes value indicates the Unix newline was already removed, as might be done via std.File.byLine or similar mechanism.

ReadHeader
alias ReadHeader = Flag!"readHeader"

Flag used by InputSourceRange to determine if the header line should be when opening a file.

Classes

ByLineSource
class ByLineSource(KeepTerminator keepTerminator, Char = char, ubyte terminator = '\n')

ByLineSource is a class of objects produced by iterating over an ByLineSourceRange.

ByLineSourceRange
class ByLineSourceRange(KeepTerminator keepTerminator = No.keepTerminator, Char = char, ubyte terminator = '\n')

ByLineSourceRange is an input range that iterates over a set of input files. It provides bufferedByLine access to each file.

InputFieldReordering
class InputFieldReordering(C, EnablePartialLines partialLinesOk = EnablePartialLines.no)

InputFieldReordering - Move select fields from an input line to an output array, reordering along the way.

InputSource
class InputSource

InputSource is a class of objects produced by iterating over an InputSourceRange.

InputSourceRange
class InputSourceRange

InputSourceRange is an input range that iterates over a set of input files.

Functions

bufferedByLine
auto bufferedByLine(File file)

bufferedByLine is a performance enhancement over std.stdio.File.byLine. It works by reading a large buffer from the input stream rather than just a single line.

byLineSourceRange
auto byLineSourceRange(string[] filepaths)

byLineSourceRange is a helper function for creating new byLineSourceRange objects.

getTsvFieldValue
T getTsvFieldValue(C[] line, size_t fieldIndex, C delim)

getTsvFieldValue extracts the value of a single field from a delimited text string.

inputSourceByChunk
auto inputSourceByChunk(InputSource source, size_t size)
auto inputSourceByChunk(InputSource source, ubyte[] buffer)

inputSourceByChunk returns a range that reads either a file handle (File) or a ubyte[] array a chunk at a time.

inputSourceRange
InputSourceRange inputSourceRange(string[] filepaths, ReadHeader readHeader)

inputSourceRange is a helper function for creating new InputSourceRange objects.

joinAppend
OutputRange joinAppend(InputRange inputRange, OutputRange outputRange, E delimiter)

joinAppend performs a join operation on an input range, appending the results to an output range.

throwIfWindowsNewline
void throwIfWindowsNewline(char[] line, char[] filename, size_t lineNum)

throwIfWindowsLineNewline throws an exception if the 'line' argument ends with a Windows/DOS line ending. This is used by TSV Utilities tools to detect Window/DOS line endings and terminate processing with an error message to the user.

Structs

BufferedOutputRange
struct BufferedOutputRange(OutputTarget)

BufferedOutputRange is a performance enhancement over writing directly to an output stream. It holds a File open for write or an OutputRange. Ouput is accumulated in an internal buffer and written to the output stream as a block.

Variables

isBufferableInputSource
enum bool isBufferableInputSource(R);

Defines the 'bufferable' input sources supported by inputSourceByChunk.

isFlushableOutputRange
enum bool isFlushableOutputRange(R, E = char);

isFlushableOutputRange returns true if R is an output range with a flush member.

Meta

License

Boost Licence 1.0 (http://boost.org/LICENSE_1_0.txt)