tsv_utils.common.utils

Utilities used by tsv-utils applications. InputFieldReordering, BufferedOutputRange, and a several others.

Utilities in this file:

  • InputFieldReordering - A class that creates a reordered subset of fields from an input line. Fields in the subset are accessed by array indicies. This is especially useful when processing the subset in a specific order, such as the order listed on the command-line at run-time.
  • BufferedOutputRange - An OutputRange with an internal buffer used to buffer output. Intended for use with stdout, it is a significant performance benefit.
  • isFlushableOutputRange - Tests if something is an OutputRange with a flush member.
  • bufferedByLine - An input range that reads from a File handle line by line. It is similar to the standard library method std.stdio.File.byLine, but quite a bit faster. This is achieved by reading in larger blocks and buffering.
  • InputSourceRange - An input range that provides open file access to a set of files. It is used to iterate over files passed as command line arguments. This enable reading header line of a file during command line argument process, then passing the open file to the main processing functions.
  • ByLineSourceRange - Similar to an InputSourceRange, except that it provides access to a byLine iterator (bufferedByLine) rather than an open file. This is used by tools that run the same processing logic both header non-header lines.
  • isBufferableInputSource - Tests if a file or input range can be read in a buffered fashion by inputSourceByChunk.
  • inputSourceByChunk - Returns a range that reads from a file handle (File) or a ubyte input range a chunk at a time.
  • joinAppend - A function that performs a join, but appending the join output to an output stream. It is a performance improvement over using join or joiner with writeln.
  • getTsvFieldValue - A convenience function when only a single value is needed from an input line.
  • throwIfWindowsNewline - A utility for detecting Windows newlines in input.

Copyright (c) 2015-2021, eBay Inc. Initially written by Jon Degenhardt

Members

Aliases

EnablePartialLines
alias EnablePartialLines = Flag!"enablePartialLines"

Flag used by the InputFieldReordering template.

LineBuffered
alias LineBuffered = Flag!"lineBuffered"

Flag accepted by input buffering ranges to indicate if data should be read using line buffering. Input is read as soon as lines are available when line buffered mode is used.

NewlineWasRemoved
alias NewlineWasRemoved = Flag!"newlineWasRemoved"

Yes|No.newlineWasRemoved is a template parameter to throwIfWindowsNewline. A Yes value indicates the Unix newline was already removed, as might be done via std.File.byLine or similar mechanism.

ReadHeader
alias ReadHeader = Flag!"readHeader"

Flag accepted by input buffering ranges to indicate if the header line should be read when opening a file.

Classes

ByLineSource
class ByLineSource(KeepTerminator keepTerminator, Char = char, ubyte terminator = '\n')

ByLineSource is a class of objects produced by iterating over an ByLineSourceRange.

ByLineSourceRange
class ByLineSourceRange(KeepTerminator keepTerminator = No.keepTerminator, Char = char, ubyte terminator = '\n')

ByLineSourceRange is an input range that iterates over a set of input files. It provides bufferedByLine access to each file.

InputFieldReordering
class InputFieldReordering(C, EnablePartialLines partialLinesOk = EnablePartialLines.no)

InputFieldReordering - Move select fields from an input line to an output array, reordering along the way.

InputSource
class InputSource

InputSource is a class of objects produced by iterating over an InputSourceRange.

InputSourceRange
class InputSourceRange

InputSourceRange is an input range that iterates over a set of input files.

Enums

BufferedOutputRangeDefaults
enum BufferedOutputRangeDefaults

BufferedOutputRangeDefaults defines the parameter defaults used by BufferedOutputRange. These can be passed to the BufferedOutputRange constructor when mixing specific setting with defaults.

Functions

bufferedByLine
auto bufferedByLine(File file, LineBuffered lineBuffered, ReadHeader readHeader)

bufferedByLine is a performance enhancement over std.stdio.File.byLine. It works by reading a large buffer from the input stream rather than just a single line.

byLineSourceRange
auto byLineSourceRange(string[] filepaths, LineBuffered lineBuffered, ReadHeader readHeader)

byLineSourceRange is a helper function for creating new byLineSourceRange objects.

getTsvFieldValue
T getTsvFieldValue(C[] line, size_t fieldIndex, C delim)

getTsvFieldValue extracts the value of a single field from a delimited text string.

inputSourceByChunk
auto inputSourceByChunk(InputSource source, size_t size)
auto inputSourceByChunk(InputSource source, ubyte[] buffer)

inputSourceByChunk returns a range that reads either a file handle (File) or a ubyte[] array a chunk at a time.

inputSourceRange
InputSourceRange inputSourceRange(string[] filepaths, ReadHeader readHeader)

inputSourceRange is a helper function for creating new InputSourceRange objects.

joinAppend
OutputRange joinAppend(InputRange inputRange, OutputRange outputRange, E delimiter)

joinAppend performs a join operation on an input range, appending the results to an output range.

throwIfWindowsNewline
void throwIfWindowsNewline(char[] line, char[] filename, size_t lineNum)

throwIfWindowsLineNewline throws an exception if the 'line' argument ends with a Windows/DOS line ending. This is used by TSV Utilities tools to detect Window/DOS line endings and terminate processing with an error message to the user.

Structs

BufferedOutputRange
struct BufferedOutputRange(OutputTarget)

BufferedOutputRange is a performance enhancement over writing directly to an output stream. It holds a File open for write or an OutputRange. Ouput is accumulated in an internal buffer and written to the output stream as a block.

Variables

isBufferableInputSource
enum bool isBufferableInputSource(R);

Defines the 'bufferable' input sources supported by inputSourceByChunk.

isFlushableOutputRange
enum bool isFlushableOutputRange(R, E = char);

isFlushableOutputRange returns true if R is an output range with a flush member.

Meta

License

Boost Licence 1.0 (http://boost.org/LICENSE_1_0.txt)